Elena Guijarro
2011-Sep-28 08:41 UTC
[R] apply lm function to dataset split by two variables
Dear all,
I am not fluent in R and am struggling to 1) apply a lm to a weight-size
dataset, thus the model has to run separately for each species, each
year; 2) extract coefs, r-squared, n, etc. The data look like this:
year sps cm w
2009 50 16 22
2009 50 17 42
2009 50 18 45
2009 51 15 45
2009 51 16 53
2009 51 17 73
2010 50 15 22
2010 50 16 41
2010 50 16 21
2010 50 17 36
2010 51 15 43
2010 51 16 67
2010 51 17 79
The following script works for data from a single year, but I don't find
a way to subset the data by sps AND year and get the function running:
f <- function(data) lm(log(w) ~ log(cm+0.5), data = data)
v <- lapply(split(data, data$sps), f)
and then I can extract the data with this script from Peter Solymos
(although I do not get the number of points used in the analysis):
myFun <-
function(lm)
{
out <- c(lm$coefficients[1],
lm$coefficients[2],
length(lm$run1$model$y),
summary(lm)$coefficients[2,2],
pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
summary(lm)$fstatistic[3], lower.tail = FALSE),
summary(lm)$r.squared)
names(out) <-
c("intercept","slope","n","slope.SE","p.value","r.squared")
return(out)}
results <- list()
for (i in 1:length(v)) results[[names(v)[i]]] <- myFun(v[[i]])
as.data.frame(results)
I have checked the plyr package, but the example that fits my data best
uses a for loop and I would like to avoid these. I have also tried the
following (among many other options) without results:
v<-tapply(data$w,list(data$cm,data$year),f)
Error in is.function(FUN) : 'FUN' is missing
Any ideas?
Thanks for your help,
Elena
[[alternative HTML version deleted]]
Dennis Murphy
2011-Sep-28 10:56 UTC
[R] apply lm function to dataset split by two variables
Hi:
Here's one way to do it with the plyr package:
dd <- read.table(textConnection("
year sps cm w
2009 50 16 22
2009 50 17 42
2009 50 18 45
2009 51 15 45
2009 51 16 53
2009 51 17 73
2010 50 15 22
2010 50 16 41
2010 50 16 21
2010 50 17 36
2010 51 15 43
2010 51 16 67
2010 51 17 79"), header = TRUE)
closeAllConnections()
library('plyr')
# Input a data frame, output a list of lm objects
modlist <- dlply(dd, .(sps, year), function(d) lm(w ~ cm, data = d))
# For use in plyr's ldply() function, the utility function should
# return a data frame. We save some effort in simple linear regression
# by noting that the two-sided p-value of the t-test of zero slope is the
# same as that of the overall F test:
extractfun <- function(m) {
cf <- coef(m)
tinfo <- summary(m)$coefficients[2, c(2, 4)]
r2 <- summary(m)$r.squared
data.frame(intercept = cf[1], slope = cf[2], n = length(resid(m)),
slope.se = tinfo[1], pval = tinfo[2], Rsq = r2)
}
# Take a list (of models) as input and output a data frame:
ldply(modlist, extractfun)
sps year intercept slope n slope.se pval Rsq
1 50 2009 -159.1667 11.5 3 4.907477 0.2567749 0.8459488
2 50 2010 -82.0000 7.0 4 7.141428 0.4303481 0.3245033
3 51 2009 -167.0000 14.0 3 3.464102 0.1544210 0.9423077
4 51 2010 -225.0000 18.0 3 3.464102 0.1210377 0.9642857
HTH,
Dennis
On Wed, Sep 28, 2011 at 1:41 AM, Elena Guijarro
<elena.guijarro at vi.ieo.es> wrote:>
> Dear all,
>
> I am not fluent in R and am struggling to 1) apply a lm to a weight-size
> dataset, thus the model has to run separately for each species, each
> year; 2) extract coefs, r-squared, n, etc. The data look like this:
>
> year ? ?sps ? ? cm ? ? ?w
> 2009 ? ?50 ? ? ?16 ? ? ?22
> 2009 ? ?50 ? ? ?17 ? ? ?42
> 2009 ? ?50 ? ? ?18 ? ? ?45
> 2009 ? ?51 ? ? ?15 ? ? ?45
> 2009 ? ?51 ? ? ?16 ? ? ?53
> 2009 ? ?51 ? ? ?17 ? ? ?73
> 2010 ? ?50 ? ? ?15 ? ? ?22
> 2010 ? ?50 ? ? ?16 ? ? ?41
> 2010 ? ?50 ? ? ?16 ? ? ?21
> 2010 ? ?50 ? ? ?17 ? ? ?36
> 2010 ? ?51 ? ? ?15 ? ? ?43
> 2010 ? ?51 ? ? ?16 ? ? ?67
> 2010 ? ?51 ? ? ?17 ? ? ?79
>
>
>
> The following script works for data from a single year, but I don't
find
> a way to subset the data by sps AND year and get the function running:
>
> f <- function(data) lm(log(w) ~ log(cm+0.5), data = data)
> v <- lapply(split(data, data$sps), f)
>
> and then I can extract the data with this script from Peter Solymos
> (although I do not get the number of points used in the analysis):
>
> myFun <-
> function(lm)
> {
> out <- c(lm$coefficients[1],
> ? ? lm$coefficients[2],
> ? ? length(lm$run1$model$y),
> ? ? summary(lm)$coefficients[2,2],
> ? ? pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
> summary(lm)$fstatistic[3], lower.tail = FALSE),
> ? ? summary(lm)$r.squared)
> names(out) <-
c("intercept","slope","n","slope.SE","p.value","r.squared")
> return(out)}
>
> results <- list()
> for (i in 1:length(v)) results[[names(v)[i]]] <- myFun(v[[i]])
> as.data.frame(results)
>
> I have checked the plyr package, but the example that fits my data best
> uses a for loop and I would like to avoid these. I have also tried the
> following (among many other options) without results:
>
> v<-tapply(data$w,list(data$cm,data$year),f)
>
> Error in is.function(FUN) : 'FUN' is missing
>
> Any ideas?
>
> Thanks for your help,
>
> Elena
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Michael Dewey
2011-Sep-28 11:48 UTC
[R] apply lm function to dataset split by two variables
At 09:41 28/09/2011, Elena Guijarro wrote:>Dear all, > >I am not fluent in R and am struggling to 1) apply a lm to a weight-size >dataset, thus the model has to run separately for each species, each >year; 2) extract coefs, r-squared, n, etc. The data look like this: > >year sps cm w >2009 50 16 22 >2009 50 17 42 >2009 50 18 45 >2009 51 15 45 >2009 51 16 53 >2009 51 17 73 >2010 50 15 22 >2010 50 16 41 >2010 50 16 21 >2010 50 17 36 >2010 51 15 43 >2010 51 16 67 >2010 51 17 79 > > > >The following script works for data from a single year, but I don't find >a way to subset the data by sps AND year and get the function running:I think lmList from the nlme package does this for you. It comes with some other helpful extractors or you can write your own as you have done. Personally I would return a list rather than a vector but that is a matter of taste.>f <- function(data) lm(log(w) ~ log(cm+0.5), data = data) >v <- lapply(split(data, data$sps), f) > >and then I can extract the data with this script from Peter Solymos >(although I do not get the number of points used in the analysis): > >myFun <- >function(lm) >{ >out <- c(lm$coefficients[1], > lm$coefficients[2], > length(lm$run1$model$y), > summary(lm)$coefficients[2,2], > pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2], >summary(lm)$fstatistic[3], lower.tail = FALSE), > summary(lm)$r.squared) >names(out) <- c("intercept","slope","n","slope.SE","p.value","r.squared") >return(out)} > >results <- list() >for (i in 1:length(v)) results[[names(v)[i]]] <- myFun(v[[i]]) >as.data.frame(results) > >I have checked the plyr package, but the example that fits my data best >uses a for loop and I would like to avoid these. I have also tried the >following (among many other options) without results: > >v<-tapply(data$w,list(data$cm,data$year),f) > >Error in is.function(FUN) : 'FUN' is missing > >Any ideas? > >Thanks for your help, > >Elena > > > [[alternative HTML version deleted]]Michael Dewey info at aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html