Elena Guijarro
2011-Sep-28 08:41 UTC
[R] apply lm function to dataset split by two variables
Dear all, I am not fluent in R and am struggling to 1) apply a lm to a weight-size dataset, thus the model has to run separately for each species, each year; 2) extract coefs, r-squared, n, etc. The data look like this: year sps cm w 2009 50 16 22 2009 50 17 42 2009 50 18 45 2009 51 15 45 2009 51 16 53 2009 51 17 73 2010 50 15 22 2010 50 16 41 2010 50 16 21 2010 50 17 36 2010 51 15 43 2010 51 16 67 2010 51 17 79 The following script works for data from a single year, but I don't find a way to subset the data by sps AND year and get the function running: f <- function(data) lm(log(w) ~ log(cm+0.5), data = data) v <- lapply(split(data, data$sps), f) and then I can extract the data with this script from Peter Solymos (although I do not get the number of points used in the analysis): myFun <- function(lm) { out <- c(lm$coefficients[1], lm$coefficients[2], length(lm$run1$model$y), summary(lm)$coefficients[2,2], pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2], summary(lm)$fstatistic[3], lower.tail = FALSE), summary(lm)$r.squared) names(out) <- c("intercept","slope","n","slope.SE","p.value","r.squared") return(out)} results <- list() for (i in 1:length(v)) results[[names(v)[i]]] <- myFun(v[[i]]) as.data.frame(results) I have checked the plyr package, but the example that fits my data best uses a for loop and I would like to avoid these. I have also tried the following (among many other options) without results: v<-tapply(data$w,list(data$cm,data$year),f) Error in is.function(FUN) : 'FUN' is missing Any ideas? Thanks for your help, Elena [[alternative HTML version deleted]]
Dennis Murphy
2011-Sep-28 10:56 UTC
[R] apply lm function to dataset split by two variables
Hi: Here's one way to do it with the plyr package: dd <- read.table(textConnection(" year sps cm w 2009 50 16 22 2009 50 17 42 2009 50 18 45 2009 51 15 45 2009 51 16 53 2009 51 17 73 2010 50 15 22 2010 50 16 41 2010 50 16 21 2010 50 17 36 2010 51 15 43 2010 51 16 67 2010 51 17 79"), header = TRUE) closeAllConnections() library('plyr') # Input a data frame, output a list of lm objects modlist <- dlply(dd, .(sps, year), function(d) lm(w ~ cm, data = d)) # For use in plyr's ldply() function, the utility function should # return a data frame. We save some effort in simple linear regression # by noting that the two-sided p-value of the t-test of zero slope is the # same as that of the overall F test: extractfun <- function(m) { cf <- coef(m) tinfo <- summary(m)$coefficients[2, c(2, 4)] r2 <- summary(m)$r.squared data.frame(intercept = cf[1], slope = cf[2], n = length(resid(m)), slope.se = tinfo[1], pval = tinfo[2], Rsq = r2) } # Take a list (of models) as input and output a data frame: ldply(modlist, extractfun) sps year intercept slope n slope.se pval Rsq 1 50 2009 -159.1667 11.5 3 4.907477 0.2567749 0.8459488 2 50 2010 -82.0000 7.0 4 7.141428 0.4303481 0.3245033 3 51 2009 -167.0000 14.0 3 3.464102 0.1544210 0.9423077 4 51 2010 -225.0000 18.0 3 3.464102 0.1210377 0.9642857 HTH, Dennis On Wed, Sep 28, 2011 at 1:41 AM, Elena Guijarro <elena.guijarro at vi.ieo.es> wrote:> > Dear all, > > I am not fluent in R and am struggling to 1) apply a lm to a weight-size > dataset, thus the model has to run separately for each species, each > year; 2) extract coefs, r-squared, n, etc. The data look like this: > > year ? ?sps ? ? cm ? ? ?w > 2009 ? ?50 ? ? ?16 ? ? ?22 > 2009 ? ?50 ? ? ?17 ? ? ?42 > 2009 ? ?50 ? ? ?18 ? ? ?45 > 2009 ? ?51 ? ? ?15 ? ? ?45 > 2009 ? ?51 ? ? ?16 ? ? ?53 > 2009 ? ?51 ? ? ?17 ? ? ?73 > 2010 ? ?50 ? ? ?15 ? ? ?22 > 2010 ? ?50 ? ? ?16 ? ? ?41 > 2010 ? ?50 ? ? ?16 ? ? ?21 > 2010 ? ?50 ? ? ?17 ? ? ?36 > 2010 ? ?51 ? ? ?15 ? ? ?43 > 2010 ? ?51 ? ? ?16 ? ? ?67 > 2010 ? ?51 ? ? ?17 ? ? ?79 > > > > The following script works for data from a single year, but I don't find > a way to subset the data by sps AND year and get the function running: > > f <- function(data) lm(log(w) ~ log(cm+0.5), data = data) > v <- lapply(split(data, data$sps), f) > > and then I can extract the data with this script from Peter Solymos > (although I do not get the number of points used in the analysis): > > myFun <- > function(lm) > { > out <- c(lm$coefficients[1], > ? ? lm$coefficients[2], > ? ? length(lm$run1$model$y), > ? ? summary(lm)$coefficients[2,2], > ? ? pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2], > summary(lm)$fstatistic[3], lower.tail = FALSE), > ? ? summary(lm)$r.squared) > names(out) <- c("intercept","slope","n","slope.SE","p.value","r.squared") > return(out)} > > results <- list() > for (i in 1:length(v)) results[[names(v)[i]]] <- myFun(v[[i]]) > as.data.frame(results) > > I have checked the plyr package, but the example that fits my data best > uses a for loop and I would like to avoid these. I have also tried the > following (among many other options) without results: > > v<-tapply(data$w,list(data$cm,data$year),f) > > Error in is.function(FUN) : 'FUN' is missing > > Any ideas? > > Thanks for your help, > > Elena > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Michael Dewey
2011-Sep-28 11:48 UTC
[R] apply lm function to dataset split by two variables
At 09:41 28/09/2011, Elena Guijarro wrote:>Dear all, > >I am not fluent in R and am struggling to 1) apply a lm to a weight-size >dataset, thus the model has to run separately for each species, each >year; 2) extract coefs, r-squared, n, etc. The data look like this: > >year sps cm w >2009 50 16 22 >2009 50 17 42 >2009 50 18 45 >2009 51 15 45 >2009 51 16 53 >2009 51 17 73 >2010 50 15 22 >2010 50 16 41 >2010 50 16 21 >2010 50 17 36 >2010 51 15 43 >2010 51 16 67 >2010 51 17 79 > > > >The following script works for data from a single year, but I don't find >a way to subset the data by sps AND year and get the function running:I think lmList from the nlme package does this for you. It comes with some other helpful extractors or you can write your own as you have done. Personally I would return a list rather than a vector but that is a matter of taste.>f <- function(data) lm(log(w) ~ log(cm+0.5), data = data) >v <- lapply(split(data, data$sps), f) > >and then I can extract the data with this script from Peter Solymos >(although I do not get the number of points used in the analysis): > >myFun <- >function(lm) >{ >out <- c(lm$coefficients[1], > lm$coefficients[2], > length(lm$run1$model$y), > summary(lm)$coefficients[2,2], > pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2], >summary(lm)$fstatistic[3], lower.tail = FALSE), > summary(lm)$r.squared) >names(out) <- c("intercept","slope","n","slope.SE","p.value","r.squared") >return(out)} > >results <- list() >for (i in 1:length(v)) results[[names(v)[i]]] <- myFun(v[[i]]) >as.data.frame(results) > >I have checked the plyr package, but the example that fits my data best >uses a for loop and I would like to avoid these. I have also tried the >following (among many other options) without results: > >v<-tapply(data$w,list(data$cm,data$year),f) > >Error in is.function(FUN) : 'FUN' is missing > >Any ideas? > >Thanks for your help, > >Elena > > > [[alternative HTML version deleted]]Michael Dewey info at aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html