Girish A.R.
2010-Jan-02 03:34 UTC
[R] Help needed on applying a function across different data sets and aggregating the results into a single data set
Hi folks, Wish y'all a Happy New Year 2010! I need some help with the following: Say I have lots of data sets, on which I have to apply a certain function on the same set of columns in each of the data set. Let's take, for ex, the typical data set is: df1 <- as.data.frame(cbind(rnorm(10),rnorm(10))) names(df1)[1] <- "A" names(df1)[2] <- "B" There are many such data sets, df2,df3,... I have the names stored in a list DF <- cbind("df1","df2",...,"df100") I now need to apply the following function: a <- lsfit(df1$A,df1$B) and stack up the following results: a$coef Intercept X -0.1479750 0.2485416 So, I would end up with as many rows as there are data sets. I think sapply would be the function I should be looking for (at least I have used it in the case of applying a function across different columns of the same data set), but for some reason I'm not able to nail down the final stages in this case. Earlier, I used something like the following in the case of applying a function across all columns(except the first) of the same data set: my.func <- function(x){ mod <- lrm(my.data$y ~ x) data.frame(t(anova(mod)[1, ]), R2 = mod$stats[10]) sapply(my.data[,-1],my.func) Where I need help is how to pass the reference of the names of the different data sets to the sapply funtion. Thanks! -Girish ===================================sessionInfo() R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] lattice_0.17-26 RWinEdt_1.8-2 ggplot2_0.8.5 digest_0.4.2 reshape_0.8.3 [6] plyr_0.1.9 proto_0.3-8 loaded via a namespace (and not attached): [1] Formula_0.2-0 kinship_1.1.0-23 MASS_7.3-4 nlme_3.1-96 plm_1.2-1 [6] sandwich_2.2-4 splines_2.10.0 survival_2.35-8 tools_2.10.0 -- View this message in context: http://n4.nabble.com/Help-needed-on-applying-a-function-across-different-data-sets-and-aggregating-the-results-into-a-sint-tp997046p997046.html Sent from the R help mailing list archive at Nabble.com.
jim holtman
2010-Jan-02 04:28 UTC
[R] Help needed on applying a function across different data sets and aggregating the results into a single data set
try this (and happy new year): DF <- cbind("df1","df2",...,"df100") result <- lapply(DF, function(.name){ lsfit(get(.name)$A, get(.name)$B) }) do.call(rbind, result) # put into matrix On Fri, Jan 1, 2010 at 10:34 PM, Girish A.R. <garamach@gmail.com> wrote:> > Hi folks, > > Wish y'all a Happy New Year 2010! > > I need some help with the following: > > Say I have lots of data sets, on which I have to apply a certain function > on > the same set of columns in each of the data set. Let's take, for ex, the > typical data set is: > > df1 <- as.data.frame(cbind(rnorm(10),rnorm(10))) > names(df1)[1] <- "A" > names(df1)[2] <- "B" > > There are many such data sets, df2,df3,... I have the names stored in a > list > DF <- cbind("df1","df2",...,"df100") > > I now need to apply the following function: > a <- lsfit(df1$A,df1$B) > > and stack up the following results: > a$coef > Intercept X > -0.1479750 0.2485416 > > So, I would end up with as many rows as there are data sets. > > I think sapply would be the function I should be looking for (at least I > have used it in the case of applying a function across different columns of > the same data set), but for some reason I'm not able to nail down the final > stages in this case. > > Earlier, I used something like the following in the case of applying a > function across all columns(except the first) of the same data set: > my.func <- function(x){ > mod <- lrm(my.data$y ~ x) > data.frame(t(anova(mod)[1, ]), R2 = mod$stats[10]) > > sapply(my.data[,-1],my.func) > > Where I need help is how to pass the reference of the names of the > different > data sets to the sapply funtion. > > Thanks! > -Girish > > ===================================> sessionInfo() > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] lattice_0.17-26 RWinEdt_1.8-2 ggplot2_0.8.5 digest_0.4.2 > reshape_0.8.3 > [6] plyr_0.1.9 proto_0.3-8 > > loaded via a namespace (and not attached): > [1] Formula_0.2-0 kinship_1.1.0-23 MASS_7.3-4 nlme_3.1-96 > plm_1.2-1 > [6] sandwich_2.2-4 splines_2.10.0 survival_2.35-8 tools_2.10.0 > > -- > View this message in context: > http://n4.nabble.com/Help-needed-on-applying-a-function-across-different-data-sets-and-aggregating-the-results-into-a-sint-tp997046p997046.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]
David Winsemius
2010-Jan-02 05:15 UTC
[R] Help needed on applying a function across different data sets and aggregating the results into a single data set
On Jan 1, 2010, at 10:34 PM, Girish A.R. wrote:> > Hi folks, > > Wish y'all a Happy New Year 2010! > > I need some help with the following: > > Say I have lots of data sets, on which I have to apply a certain > function on > the same set of columns in each of the data set. Let's take, for ex, > the > typical data set is: > > df1 <- as.data.frame(cbind(rnorm(10),rnorm(10))) > names(df1)[1] <- "A" > names(df1)[2] <- "B" > > There are many such data sets, df2,df3,... I have the names stored > in a list > DF <- cbind("df1","df2",...,"df100")Unfortunately, that is not a list, but rather a character vector > DF <- cbind("df1","df2","df100") > str(DF) chr [1, 1:3] "df1" "df2" "df100" Instead define it as a list of objects (i.e., with no quotes): LL <- list(df1, df2, .... , df100) #and then lapply(LL, function(x) lsfit(x$A,x$B) ) -- Daid.> > I now need to apply the following function: > a <- lsfit(df1$A,df1$B) > > and stack up the following results: > a$coef > Intercept X > -0.1479750 0.2485416 > > So, I would end up with as many rows as there are data sets. > > I think sapply would be the function I should be looking for (at > least I > have used it in the case of applying a function across different > columns of > the same data set), but for some reason I'm not able to nail down > the final > stages in this case. > > Earlier, I used something like the following in the case of applying a > function across all columns(except the first) of the same data set: > my.func <- function(x){ > mod <- lrm(my.data$y ~ x) > data.frame(t(anova(mod)[1, ]), R2 = mod$stats[10]) > > sapply(my.data[,-1],my.func) > > Where I need help is how to pass the reference of the names of the > different > data sets to the sapply funtion. > > Thanks! > -Girish > > ===================================> sessionInfo() > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] grid stats graphics grDevices utils datasets > methods > base > > other attached packages: > [1] lattice_0.17-26 RWinEdt_1.8-2 ggplot2_0.8.5 digest_0.4.2 > reshape_0.8.3 > [6] plyr_0.1.9 proto_0.3-8 > > loaded via a namespace (and not attached): > [1] Formula_0.2-0 kinship_1.1.0-23 MASS_7.3-4 nlme_3.1-96 > plm_1.2-1 > [6] sandwich_2.2-4 splines_2.10.0 survival_2.35-8 tools_2.10.0 > > -- > View this message in context: http://n4.nabble.com/Help-needed-on-applying-a-function-across-different-data-sets-and-aggregating-the-results-into-a-sint-tp997046p997046.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Girish A.R.
2010-Jan-02 06:15 UTC
[R] Help needed on applying a function across different data sets and aggregating the results into a single data set
Thanks for the replies, Jim, David, and Dennis (who replied to me directly)! To summarize, here's what worked for me: ==========dflist <- list(df1, df2, df3) lsfun <- function(df) with(df, lsfit(A, B)$coef) res <- lapply(dflist, lsfun) do.call(rbind, res) ========== cheers, -Girish Girish A.R. wrote:> > Hi folks, > > Wish y'all a Happy New Year 2010! > > I need some help with the following: > > Say I have lots of data sets, on which I have to apply a certain function > on the same set of columns in each of the data set. Let's take, for ex, > the typical data set is: > > df1 <- as.data.frame(cbind(rnorm(10),rnorm(10))) > names(df1)[1] <- "A" > names(df1)[2] <- "B" > > There are many such data sets, df2,df3,... I have the names stored in a > list > DF <- cbind("df1","df2",...,"df100") > > I now need to apply the following function: > a <- lsfit(df1$A,df1$B) > > and stack up the following results: > a$coef > Intercept X > -0.1479750 0.2485416 > > So, I would end up with as many rows as there are data sets. > > I think sapply would be the function I should be looking for (at least I > have used it in the case of applying a function across different columns > of the same data set), but for some reason I'm not able to nail down the > final stages in this case. > > Earlier, I used something like the following in the case of applying a > function across all columns(except the first) of the same data set: > my.func <- function(x){ > mod <- lrm(my.data$y ~ x) > data.frame(t(anova(mod)[1, ]), R2 = mod$stats[10]) > > sapply(my.data[,-1],my.func) > > Where I need help is how to pass the reference of the names of the > different data sets to the sapply funtion. > > Thanks! > -Girish > > ===================================> sessionInfo() > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] lattice_0.17-26 RWinEdt_1.8-2 ggplot2_0.8.5 digest_0.4.2 > reshape_0.8.3 > [6] plyr_0.1.9 proto_0.3-8 > > loaded via a namespace (and not attached): > [1] Formula_0.2-0 kinship_1.1.0-23 MASS_7.3-4 nlme_3.1-96 > plm_1.2-1 > [6] sandwich_2.2-4 splines_2.10.0 survival_2.35-8 tools_2.10.0 > >-- View this message in context: http://n4.nabble.com/Help-needed-on-applying-a-function-across-different-data-sets-and-aggregating-the-results-into-a-sint-tp997046p997094.html Sent from the R help mailing list archive at Nabble.com.