Hi:
Time to jack up your level of R knowledge, courtesy of the apply family.
The 'R way' to do what you want is to split the data by species into
list
components, run lm() on each component and save the resulting lm objects in
a list. The next trick is to figure out how to extract what you want, which
may require a bit more ingenuity in delving into aRcana :)
-----
Aside:
To reinforce Joshua's point, variable names with spaces not explicitly
enclosed in quotes is bad practice, especially when someone who wants to
help tries to copy and paste your data into his/her R session:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
:
line 1 did not have 4 elements
R expected four columns of data, but you provided three. In the future, it's
a good idea to include your data example with dput(), which outputs
dput(d)
structure(list(species = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L), o2con = c(0.5, 0.6, 0.4,
0.4, 0.5, 0.3, 0.4, 0.5, 0.7, 0.9, 0.3, 0.7, 0.4, 0.3, 0.3, 0.6,
0.9, 0.2), bm = c(5L, 2L, 4L, 2L, 3L, 7L, 8L, 3L, 4L, 2L, 6L,
2L, 1L, 7L, 2L, 1L, 7L, 5L)), .Names = c("species", "o2con",
"bm"), class = "data.frame", row.names = c(NA, -18L))
This is easily copied and pasted into anyone's R session....but I digress.
------
Calling your data frame d, here's how to run the same regression model on
all species:
# Create a function to perform the modeling, taking a data frame df as input
f <- function(df) lm(o2con ~ bm, data = df)
# Use lapply() to apply the function to each 'split' of the data, by
species:
v <- lapply(split(d, d$species), f)
# v is a list object, where each component of the list is an lm object,
# which itself is a list. In other words, it's a list of lists. do.call() is
a
# very useful function that applies a function to components of a list.
# rbind and cbind are commonly used to slurp together common elements
# from each component of a list.
# Pulling out the coefficients from each model:> do.call(rbind, lapply(v, coef))
(Intercept) bm
1 0.5176471 -0.01176471
2 0.9253731 -0.07611940
3 0.5942308 -0.04230769
5 0.3351648 0.04395604
# Extract the r-squared values from each model:
g <- function(m) summary(m)$r.squared> do.call(rbind, lapply(v, g))
[,1]
1 0.03361345
2 0.66932578
3 0.43291592
5 0.14652015
# But you have to be careful...e.g., since you have unequal sample sizes per
species,> do.call(cbind, lapply(v, resid))
1 2 3 5
1 0.04117647 -0.09253731 -0.040384615 -0.1230769
2 0.10588235 0.08358209 0.190384615 0.2208791
3 -0.07058824 -0.19701493 -0.151923077 0.2571429
4 -0.09411765 0.07910448 0.001923077 -0.3549451
5 0.01764706 0.12686567 -0.040384615 -0.1230769
Warning message:
In function (..., deparse.level = 1) :
number of rows of result is not a multiple of vector length (arg 3)
Notice how the first residual is recycled in each of groups 3 and 5. That's
a potential gotcha.
This gives you a small glimpse into the power that R can deliver in data
analysis.
HTH,
Dennis
On Sun, Jul 18, 2010 at 2:29 PM, karmakiller
<roisinmoriarty@gmail.com>wrote:
>
> Hi All,
>
> I have a large data set with many columns of data. One of these columns is
> a
> species identifier and the remainder are variables such as temperature or
> mass. Currently I am carrying out a single regression on subsets of the
> data
> set, e.g. separated data sets with only the data from one species at a
> time.
> I have been searching for a thread that will help me to understand how best
> to repeat this process for each different species identifier in that
> variable column. I can’t seem to find one that is similar to what I am
> trying to do. It might be the case that I am not looking for the right
> thing
> or that I do not fully understand the process.
>
> How do I run a simple loop that produces a regression for each species as
> identified in the variable species id, which is one column in the large
> data
> set that I am using?
>
> Simple regression that I wish to repeat
>
> data<- read.table("…/STUDY.txt",header=T)
> names(data)
> model<- with(data,{lm(o2con~bm)})
> summary(model)
>
>
> sample data set
>
> species id o2con bm
> 1 0.5 5
> 1 0.6 2
> 1 0.4 4
> 1 0.4 2
> 1 0.5 3
> 2 0.3 7
> 2 0.4 8
> 2 0.5 3
> 2 0.7 4
> 2 0.9 2
> 3 0.3 6
> 3 0.7 2
> 3 0.4 1
> 3 0.3 7
> 5 0.3 2
> 5 0.6 1
> 5 0.9 7
> 5 0.2 5
>
> I would be very grateful for some help with this. I really like using R and
> I can usually figure out what I want to do but I have been trying to figure
> this out for a while now and I am getting nowhere.
>
> Thank you.
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/simple-loop-analysing-subsets-tp2293383p2293383.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]