Hi R user, I'm new to R so my problem is probably pretty simple but I'm stuck: my data is consist of 2 variables: co2, temp and one treatment (l_group). The sample size is different among the treatments. so that, I wanted to make equal sample size among three groups (A,B and C) of the treatment. For this one, I used subsamples technique. Using subsample, each time the data are different among the three groups of the treatment. so that I want to run regression (co2~temp) for a 100 subsamples for each group of treatment (100 times subsample). it means that I will have 100 regression equations.? Later, I want to compare the slope of the regression among the three groups. is there simple way to make a loop so that I can compare it? Thanks in advance! Angela ===============Here is the example: dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23, 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34, 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119, 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397, 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112, 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("co2", "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L )) head(dat) library(sampling) # strata.sampling ----- strata.sampling <- function(data, group,size, method = NULL) { ?require(sampling) ? if (is.null(method)) method <- "srswor" ? temp <- data[order(data[[group]]), ] ? ifelse(length(size)> 1, ???????? size <- size, ???????? ifelse(size < 1, ??????????????? size <- round(table(temp[group]) * size), ??????????????? size <- rep(size, times=length(table(temp[group]))))) ? strat = strata(temp, stratanames = names(temp[group]), ???????????????? size = size, method = method) ? getdata(temp, strat) } #-------------------------------------------------- sub_dat <- strata.sampling(dat, 'L_group', 4)# Lmodel_subdata1<-lm(co2~temp, data=subdat) Lmodel_subdata1#coef sub_dat2 <- strata.sampling(dat, 'L_group', 4)# Lmodel_subdata2<-lm(co2~temp, data=subdat2) Lmodel_subdata2#coef and so on.....[for 100 times) Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....)
Comment inline On 17/02/2015 12:40, Angela Smith wrote:> > > Hi R user, > I'm new to R so > my problem is probably pretty simple but I'm stuck: > > > > my data is consist of 2 variables: co2, temp and one > treatment (l_group). The sample size is different among the treatments. so > > that, I wanted to make equal sample size among three groups (A,B and C) of the > treatment. >Not sure whether that is necessary for regression but you did not tell us why you want to do that.> For this one, I used subsamples technique. Using > subsample, each time the data are different among the three groups of the > treatment. > > so that I want to run regression (co2~temp) for a 100 > subsamples for each group of treatment (100 times subsample). >The usual way to do this is to store the subsamples in a list and then write a function and use lapply, say to store your models. You then have another list to which you can then apply the extractor function of your choice.> it means that I will have 100 regression equations. Later, I want to compare the slope of the > regression among the three groups. is there simple way to make a loop so that I > can compare it? > > Thanks in advance! > > > > Angela > > ===============> Here is the example: > > dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23, > 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34, > 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119, > 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397, > 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112, > 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("co2", > "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L > )) > > head(dat) > library(sampling) > > # strata.sampling ----- > strata.sampling <- function(data, group,size, method = NULL) { > require(sampling) > if (is.null(method)) method <- "srswor" > temp <- data[order(data[[group]]), ] > ifelse(length(size)> 1, > size <- size, > ifelse(size < 1, > size <- round(table(temp[group]) * size), > size <- rep(size, times=length(table(temp[group]))))) > strat = strata(temp, stratanames = names(temp[group]), > size = size, method = method) > getdata(temp, strat) > } > > #-------------------------------------------------- > sub_dat <- strata.sampling(dat, 'L_group', 4)# > Lmodel_subdata1<-lm(co2~temp, data=subdat) > Lmodel_subdata1#coef > > sub_dat2 <- strata.sampling(dat, 'L_group', 4)# > Lmodel_subdata2<-lm(co2~temp, data=subdat2) > Lmodel_subdata2#coef > > and so on.....[for 100 times) > > Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....) > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2015.0.5645 / Virus Database: 4284/9131 - Release Date: 02/17/15 > > >-- Michael http://www.dewey.myzen.co.uk
Expanding a bit on Michael's answer, you don't need the sampling package for this, just the sample.int() function to draw a random set of integers that you will use to extract rows from each of your groups. The write a function that returns what you want, the regression slopes from each group and use that function with the replicate() function. Your problem is a good way to illustrate the lapply(), sapply(), replicate() family of functions in R: # Split the data into a list of data frames datlist <- split(dat, dat$L_group) # Write a function to draw the sample and perform the regression on each group slopes <- function(lst) { # Get the minimum sample size minsize <- min(sapply(lst, nrow)) # Draw sample (row numbers) of size minsize from each group samlist <- lapply(sapply(lst, nrow), sample.int, size=minsize) # Extract sample from each group samples <- lapply(names(lst), function(x) lst[[x]][samlist[[x]],]) # Run the regressions for each group and extract the slopes results <- sapply(samples, function(x) coef(lm(co2~temp, x))[2]) # Use the group names to label the slopes names(results) <- names(datlist) return(results) } # You can get a single set of results with (results <- slopes(datlist)) # A B C # 1.0128392 0.2658041 1.3423786 # To get 100 runs many <- t(replicate(100, slopes(datlist))) head(many) # A B C # [1,] 1.4326103 0.2658041 1.357475 # [2,] 1.4754324 0.2658041 1.309208 # [3,] 0.9838589 0.2658041 1.408987 # [4,] 0.9993144 0.2658041 1.354297 # [5,] 1.0134187 0.2658041 1.397112 # [6,] 1.4922856 0.2658041 1.312531>------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael Dewey Sent: Tuesday, February 17, 2015 9:52 AM To: Angela Smith; r-help at r-project.org Subject: Re: [R] subsamples and regressions for 100 times Comment inline On 17/02/2015 12:40, Angela Smith wrote:> > > Hi R user, > I'm new to R so > my problem is probably pretty simple but I'm stuck: > > > > my data is consist of 2 variables: co2, temp and one > treatment (l_group). The sample size is different among the treatments. so > > that, I wanted to make equal sample size among three groups (A,B and C) of the > treatment. >Not sure whether that is necessary for regression but you did not tell us why you want to do that.> For this one, I used subsamples technique. Using > subsample, each time the data are different among the three groups of the > treatment. > > so that I want to run regression (co2~temp) for a 100 > subsamples for each group of treatment (100 times subsample). >The usual way to do this is to store the subsamples in a list and then write a function and use lapply, say to store your models. You then have another list to which you can then apply the extractor function of your choice.> it means that I will have 100 regression equations. Later, I want to compare the slope of the > regression among the three groups. is there simple way to make a loop so that I > can compare it? > > Thanks in advance! > > > > Angela > > ===============> Here is the example: > > dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23, > 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34, > 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119, > 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397, > 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112, > 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("co2", > "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L > )) > > head(dat) > library(sampling) > > # strata.sampling ----- > strata.sampling <- function(data, group,size, method = NULL) { > require(sampling) > if (is.null(method)) method <- "srswor" > temp <- data[order(data[[group]]), ] > ifelse(length(size)> 1, > size <- size, > ifelse(size < 1, > size <- round(table(temp[group]) * size), > size <- rep(size, times=length(table(temp[group]))))) > strat = strata(temp, stratanames = names(temp[group]), > size = size, method = method) > getdata(temp, strat) > } > > #-------------------------------------------------- > sub_dat <- strata.sampling(dat, 'L_group', 4)# > Lmodel_subdata1<-lm(co2~temp, data=subdat) > Lmodel_subdata1#coef > > sub_dat2 <- strata.sampling(dat, 'L_group', 4)# > Lmodel_subdata2<-lm(co2~temp, data=subdat2) > Lmodel_subdata2#coef > > and so on.....[for 100 times) > > Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....) > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2015.0.5645 / Virus Database: 4284/9131 - Release Date: 02/17/15 > > >-- Michael http://www.dewey.myzen.co.uk ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.