Hi R user,
I'm new to R so
my problem is probably pretty simple but I'm stuck:
my data is consist of 2 variables: co2, temp and one
treatment (l_group). The sample size is different among the treatments. so
that, I wanted to make equal sample size among three groups (A,B and C) of the
treatment.
For this one, I used subsamples technique. Using
subsample, each time the data are different among the three groups of the
treatment.
so that I want to run regression (co2~temp) for a 100
subsamples for each group of treatment (100 times subsample).
it means that I will have 100 regression equations.? Later, I want to compare
the slope of the
regression among the three groups. is there simple way to make a loop so that I
can compare it?
Thanks in advance!
Angela
===============Here is the example:
dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23,
0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34,
0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119,
0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397,
0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112,
0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B",
"C"), class = "factor")), .Names = c("co2",
"temp", "L_group"), class = "data.frame",
row.names = c(NA, -24L
))
head(dat)
library(sampling)
# strata.sampling -----
strata.sampling <- function(data, group,size, method = NULL) {
?require(sampling)
? if (is.null(method)) method <- "srswor"
? temp <- data[order(data[[group]]), ]
? ifelse(length(size)> 1,
???????? size <- size,
???????? ifelse(size < 1,
??????????????? size <- round(table(temp[group]) * size),
??????????????? size <- rep(size, times=length(table(temp[group])))))
? strat = strata(temp, stratanames = names(temp[group]),
???????????????? size = size, method = method)
? getdata(temp, strat)
}
#--------------------------------------------------
sub_dat <- strata.sampling(dat, 'L_group', 4)#
Lmodel_subdata1<-lm(co2~temp, data=subdat)
Lmodel_subdata1#coef
sub_dat2 <- strata.sampling(dat, 'L_group', 4)#
Lmodel_subdata2<-lm(co2~temp, data=subdat2)
Lmodel_subdata2#coef
and so on.....[for 100 times)
Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....)
Comment inline On 17/02/2015 12:40, Angela Smith wrote:> > > Hi R user, > I'm new to R so > my problem is probably pretty simple but I'm stuck: > > > > my data is consist of 2 variables: co2, temp and one > treatment (l_group). The sample size is different among the treatments. so > > that, I wanted to make equal sample size among three groups (A,B and C) of the > treatment. >Not sure whether that is necessary for regression but you did not tell us why you want to do that.> For this one, I used subsamples technique. Using > subsample, each time the data are different among the three groups of the > treatment. > > so that I want to run regression (co2~temp) for a 100 > subsamples for each group of treatment (100 times subsample). >The usual way to do this is to store the subsamples in a list and then write a function and use lapply, say to store your models. You then have another list to which you can then apply the extractor function of your choice.> it means that I will have 100 regression equations. Later, I want to compare the slope of the > regression among the three groups. is there simple way to make a loop so that I > can compare it? > > Thanks in advance! > > > > Angela > > ===============> Here is the example: > > dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23, > 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34, > 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119, > 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397, > 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112, > 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("co2", > "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L > )) > > head(dat) > library(sampling) > > # strata.sampling ----- > strata.sampling <- function(data, group,size, method = NULL) { > require(sampling) > if (is.null(method)) method <- "srswor" > temp <- data[order(data[[group]]), ] > ifelse(length(size)> 1, > size <- size, > ifelse(size < 1, > size <- round(table(temp[group]) * size), > size <- rep(size, times=length(table(temp[group]))))) > strat = strata(temp, stratanames = names(temp[group]), > size = size, method = method) > getdata(temp, strat) > } > > #-------------------------------------------------- > sub_dat <- strata.sampling(dat, 'L_group', 4)# > Lmodel_subdata1<-lm(co2~temp, data=subdat) > Lmodel_subdata1#coef > > sub_dat2 <- strata.sampling(dat, 'L_group', 4)# > Lmodel_subdata2<-lm(co2~temp, data=subdat2) > Lmodel_subdata2#coef > > and so on.....[for 100 times) > > Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....) > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2015.0.5645 / Virus Database: 4284/9131 - Release Date: 02/17/15 > > >-- Michael http://www.dewey.myzen.co.uk
Expanding a bit on Michael's answer, you don't need the sampling package
for this, just the sample.int() function to draw a random set of integers that
you will use to extract rows from each of your groups. The write a function that
returns what you want, the regression slopes from each group and use that
function with the replicate() function. Your problem is a good way to illustrate
the lapply(), sapply(), replicate() family of functions in R:
# Split the data into a list of data frames
datlist <- split(dat, dat$L_group)
# Write a function to draw the sample and perform the regression on each group
slopes <- function(lst) {
# Get the minimum sample size
minsize <- min(sapply(lst, nrow))
# Draw sample (row numbers) of size minsize from each group
samlist <- lapply(sapply(lst, nrow), sample.int, size=minsize)
# Extract sample from each group
samples <- lapply(names(lst), function(x) lst[[x]][samlist[[x]],])
# Run the regressions for each group and extract the slopes
results <- sapply(samples, function(x) coef(lm(co2~temp, x))[2])
# Use the group names to label the slopes
names(results) <- names(datlist)
return(results)
}
# You can get a single set of results with
(results <- slopes(datlist))
# A B C
# 1.0128392 0.2658041 1.3423786
# To get 100 runs
many <- t(replicate(100, slopes(datlist)))
head(many)
# A B C
# [1,] 1.4326103 0.2658041 1.357475
# [2,] 1.4754324 0.2658041 1.309208
# [3,] 0.9838589 0.2658041 1.408987
# [4,] 0.9993144 0.2658041 1.354297
# [5,] 1.0134187 0.2658041 1.397112
# [6,] 1.4922856 0.2658041 1.312531>
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael Dewey
Sent: Tuesday, February 17, 2015 9:52 AM
To: Angela Smith; r-help at r-project.org
Subject: Re: [R] subsamples and regressions for 100 times
Comment inline
On 17/02/2015 12:40, Angela Smith wrote:>
>
> Hi R user,
> I'm new to R so
> my problem is probably pretty simple but I'm stuck:
>
>
>
> my data is consist of 2 variables: co2, temp and one
> treatment (l_group). The sample size is different among the treatments. so
>
> that, I wanted to make equal sample size among three groups (A,B and C) of
the
> treatment.
>
Not sure whether that is necessary for regression but you did not tell
us why you want to do that.
> For this one, I used subsamples technique. Using
> subsample, each time the data are different among the three groups of the
> treatment.
>
> so that I want to run regression (co2~temp) for a 100
> subsamples for each group of treatment (100 times subsample).
>
The usual way to do this is to store the subsamples in a list and then
write a function and use lapply, say to store your models. You then have
another list to which you can then apply the extractor function of your
choice.
> it means that I will have 100 regression equations. Later, I want to
compare the slope of the
> regression among the three groups. is there simple way to make a loop so
that I
> can compare it?
>
> Thanks in advance!
>
>
>
> Angela
>
> ===============> Here is the example:
>
> dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23,
> 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34,
> 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119,
> 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397,
> 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112,
> 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B",
"C"), class = "factor")), .Names = c("co2",
> "temp", "L_group"), class = "data.frame",
row.names = c(NA, -24L
> ))
>
> head(dat)
> library(sampling)
>
> # strata.sampling -----
> strata.sampling <- function(data, group,size, method = NULL) {
> require(sampling)
> if (is.null(method)) method <- "srswor"
> temp <- data[order(data[[group]]), ]
> ifelse(length(size)> 1,
> size <- size,
> ifelse(size < 1,
> size <- round(table(temp[group]) * size),
> size <- rep(size, times=length(table(temp[group])))))
> strat = strata(temp, stratanames = names(temp[group]),
> size = size, method = method)
> getdata(temp, strat)
> }
>
> #--------------------------------------------------
> sub_dat <- strata.sampling(dat, 'L_group', 4)#
> Lmodel_subdata1<-lm(co2~temp, data=subdat)
> Lmodel_subdata1#coef
>
> sub_dat2 <- strata.sampling(dat, 'L_group', 4)#
> Lmodel_subdata2<-lm(co2~temp, data=subdat2)
> Lmodel_subdata2#coef
>
> and so on.....[for 100 times)
>
> Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....)
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2015.0.5645 / Virus Database: 4284/9131 - Release Date: 02/17/15
>
>
>
--
Michael
http://www.dewey.myzen.co.uk
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.