thr3ads.net - R help - [R] Help needed on applying a function across different data sets and aggregating the results into a single data set [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Girish A.R.

2010-Jan-02 03:34 UTC

[R] Help needed on applying a function across different data sets and aggregating the results into a single data set

Hi folks,

Wish y'all a Happy New Year 2010!

I need some help with the following:

Say I have lots of data sets, on which I have to apply a certain function on
the same set of columns in each of the data set. Let's take, for ex, the
typical data set is:

df1 <- as.data.frame(cbind(rnorm(10),rnorm(10)))
names(df1)[1] <- "A"
names(df1)[2] <- "B"

There are many such data sets, df2,df3,... I have the names stored in a list
DF <- cbind("df1","df2",...,"df100")

I now need to apply the following function:
a <- lsfit(df1$A,df1$B)

and stack up the following results:
a$coef
 Intercept          X 
-0.1479750  0.2485416 

So, I would end up with as many rows as there are data sets.

I think sapply would be the function I should be looking for (at least I
have used it in the case of applying a function across different columns of
the same data set), but for some reason I'm not able to nail down the final
stages in this case. 

Earlier, I used something like the following in the case of applying a
function across all columns(except the first) of the same data set:
my.func <- function(x){
mod <- lrm(my.data$y ~ x) 
data.frame(t(anova(mod)[1, ]), R2 = mod$stats[10]) 

sapply(my.data[,-1],my.func)

Where I need help is how to pass the reference of the names of the different
data sets to the sapply funtion.

Thanks!
-Girish

===================================sessionInfo()
R version 2.10.0 (2009-10-26) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
base     

other attached packages:
[1] lattice_0.17-26 RWinEdt_1.8-2   ggplot2_0.8.5   digest_0.4.2   
reshape_0.8.3  
[6] plyr_0.1.9      proto_0.3-8    

loaded via a namespace (and not attached):
[1] Formula_0.2-0    kinship_1.1.0-23 MASS_7.3-4       nlme_3.1-96     
plm_1.2-1       
[6] sandwich_2.2-4   splines_2.10.0   survival_2.35-8  tools_2.10.0   

-- 
View this message in context:
http://n4.nabble.com/Help-needed-on-applying-a-function-across-different-data-sets-and-aggregating-the-results-into-a-sint-tp997046p997046.html
Sent from the R help mailing list archive at Nabble.com.

jim holtman

2010-Jan-02 04:28 UTC

head link

[R] Help needed on applying a function across different data sets and aggregating the results into a single data set

try this (and happy new year):


DF <- cbind("df1","df2",...,"df100")
result <- lapply(DF, function(.name){
    lsfit(get(.name)$A, get(.name)$B)
})
do.call(rbind, result)  # put into matrix


On Fri, Jan 1, 2010 at 10:34 PM, Girish A.R. <garamach@gmail.com> wrote:
>
> Hi folks,
>
> Wish y'all a Happy New Year 2010!
>
> I need some help with the following:
>
> Say I have lots of data sets, on which I have to apply a certain function
> on
> the same set of columns in each of the data set. Let's take, for ex,
the
> typical data set is:
>
> df1 <- as.data.frame(cbind(rnorm(10),rnorm(10)))
> names(df1)[1] <- "A"
> names(df1)[2] <- "B"
>
> There are many such data sets, df2,df3,... I have the names stored in a
> list
> DF <- cbind("df1","df2",...,"df100")
>
> I now need to apply the following function:
> a <- lsfit(df1$A,df1$B)
>
> and stack up the following results:
> a$coef
>  Intercept          X
> -0.1479750  0.2485416
>
> So, I would end up with as many rows as there are data sets.
>
> I think sapply would be the function I should be looking for (at least I
> have used it in the case of applying a function across different columns of
> the same data set), but for some reason I'm not able to nail down the
final
> stages in this case.
>
> Earlier, I used something like the following in the case of applying a
> function across all columns(except the first) of the same data set:
> my.func <- function(x){
> mod <- lrm(my.data$y ~ x)
> data.frame(t(anova(mod)[1, ]), R2 = mod$stats[10])
>
> sapply(my.data[,-1],my.func)
>
> Where I need help is how to pass the reference of the names of the
> different
> data sets to the sapply funtion.
>
> Thanks!
> -Girish
>
> ===================================> sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] lattice_0.17-26 RWinEdt_1.8-2   ggplot2_0.8.5   digest_0.4.2
> reshape_0.8.3
> [6] plyr_0.1.9      proto_0.3-8
>
> loaded via a namespace (and not attached):
> [1] Formula_0.2-0    kinship_1.1.0-23 MASS_7.3-4       nlme_3.1-96
> plm_1.2-1
> [6] sandwich_2.2-4   splines_2.10.0   survival_2.35-8  tools_2.10.0
>
> --
> View this message in context:
>
http://n4.nabble.com/Help-needed-on-applying-a-function-across-different-data-sets-and-aggregating-the-results-into-a-sint-tp997046p997046.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

	[[alternative HTML version deleted]]

David Winsemius

2010-Jan-02 05:15 UTC

head link

[R] Help needed on applying a function across different data sets and aggregating the results into a single data set

On Jan 1, 2010, at 10:34 PM, Girish A.R. wrote:
>
> Hi folks,
>
> Wish y'all a Happy New Year 2010!
>
> I need some help with the following:
>
> Say I have lots of data sets, on which I have to apply a certain  
> function on
> the same set of columns in each of the data set. Let's take, for ex,  
> the
> typical data set is:
>
> df1 <- as.data.frame(cbind(rnorm(10),rnorm(10)))
> names(df1)[1] <- "A"
> names(df1)[2] <- "B"
>
> There are many such data sets, df2,df3,... I have the names stored  
> in a list
> DF <- cbind("df1","df2",...,"df100")
Unfortunately, that is not a list, but rather a character vector
 > DF <- cbind("df1","df2","df100")
 > str(DF)
  chr [1, 1:3] "df1" "df2" "df100"

Instead define it as a list of objects (i.e., with no quotes):

LL <- list(df1, df2, .... , df100)
#and then
lapply(LL, function(x) lsfit(x$A,x$B) )

-- 
Daid.
>
> I now need to apply the following function:
> a <- lsfit(df1$A,df1$B)
>
> and stack up the following results:
> a$coef
> Intercept          X
> -0.1479750  0.2485416
>
> So, I would end up with as many rows as there are data sets.
>
> I think sapply would be the function I should be looking for (at  
> least I
> have used it in the case of applying a function across different  
> columns of
> the same data set), but for some reason I'm not able to nail down  
> the final
> stages in this case.
>
> Earlier, I used something like the following in the case of applying a
> function across all columns(except the first) of the same data set:
> my.func <- function(x){
> mod <- lrm(my.data$y ~ x)
> data.frame(t(anova(mod)[1, ]), R2 = mod$stats[10])
>
> sapply(my.data[,-1],my.func)
>
> Where I need help is how to pass the reference of the names of the  
> different
> data sets to the sapply funtion.
>
> Thanks!
> -Girish
>
> ===================================> sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets   
> methods
> base
>
> other attached packages:
> [1] lattice_0.17-26 RWinEdt_1.8-2   ggplot2_0.8.5   digest_0.4.2
> reshape_0.8.3
> [6] plyr_0.1.9      proto_0.3-8
>
> loaded via a namespace (and not attached):
> [1] Formula_0.2-0    kinship_1.1.0-23 MASS_7.3-4       nlme_3.1-96
> plm_1.2-1
> [6] sandwich_2.2-4   splines_2.10.0   survival_2.35-8  tools_2.10.0
>
> -- 
> View this message in context:
http://n4.nabble.com/Help-needed-on-applying-a-function-across-different-data-sets-and-aggregating-the-results-into-a-sint-tp997046p997046.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Girish A.R.

2010-Jan-02 06:15 UTC

head link

[R] Help needed on applying a function across different data sets and aggregating the results into a single data set

Thanks for the replies, Jim, David, and Dennis (who replied to me directly)! 

To summarize, here's what worked for me:
==========dflist <- list(df1, df2, df3)
lsfun <- function(df) with(df, lsfit(A, B)$coef)
res <- lapply(dflist, lsfun)
do.call(rbind, res)
==========
cheers,
-Girish


Girish A.R. wrote:> 
> Hi folks,
> 
> Wish y'all a Happy New Year 2010!
> 
> I need some help with the following:
> 
> Say I have lots of data sets, on which I have to apply a certain function
> on the same set of columns in each of the data set. Let's take, for ex,
> the typical data set is:
> 
> df1 <- as.data.frame(cbind(rnorm(10),rnorm(10)))
> names(df1)[1] <- "A"
> names(df1)[2] <- "B"
> 
> There are many such data sets, df2,df3,... I have the names stored in a
> list
> DF <- cbind("df1","df2",...,"df100")
> 
> I now need to apply the following function:
> a <- lsfit(df1$A,df1$B)
> 
> and stack up the following results:
> a$coef
>  Intercept          X 
> -0.1479750  0.2485416 
> 
> So, I would end up with as many rows as there are data sets.
> 
> I think sapply would be the function I should be looking for (at least I
> have used it in the case of applying a function across different columns
> of the same data set), but for some reason I'm not able to nail down
the
> final stages in this case. 
> 
> Earlier, I used something like the following in the case of applying a
> function across all columns(except the first) of the same data set:
> my.func <- function(x){
> mod <- lrm(my.data$y ~ x) 
> data.frame(t(anova(mod)[1, ]), R2 = mod$stats[10]) 
> 
> sapply(my.data[,-1],my.func)
> 
> Where I need help is how to pass the reference of the names of the
> different data sets to the sapply funtion.
> 
> Thanks!
> -Girish
> 
> ===================================> sessionInfo()
> R version 2.10.0 (2009-10-26) 
> i386-pc-mingw32 
> 
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252   
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252    
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods  
> base     
> 
> other attached packages:
> [1] lattice_0.17-26 RWinEdt_1.8-2   ggplot2_0.8.5   digest_0.4.2   
> reshape_0.8.3  
> [6] plyr_0.1.9      proto_0.3-8    
> 
> loaded via a namespace (and not attached):
> [1] Formula_0.2-0    kinship_1.1.0-23 MASS_7.3-4       nlme_3.1-96     
> plm_1.2-1       
> [6] sandwich_2.2-4   splines_2.10.0   survival_2.35-8  tools_2.10.0   
> 
> 
-- 
View this message in context:
http://n4.nabble.com/Help-needed-on-applying-a-function-across-different-data-sets-and-aggregating-the-results-into-a-sint-tp997046p997094.html
Sent from the R help mailing list archive at Nabble.com.

Maybe Matching Threads

Search for more reasonably related threads

R help - Jan 2010 - Help needed on applying a function across different data sets and aggregating the results into a single data set

[R] Help needed on applying a function across different data sets and aggregating the results into a single data set

[R] Help needed on applying a function across different data sets and aggregating the results into a single data set

[R] Help needed on applying a function across different data sets and aggregating the results into a single data set

[R] Help needed on applying a function across different data sets and aggregating the results into a single data set

Maybe Matching Threads