Nilaya Sharma
2011-Aug-29 15:37 UTC
[R] splitting into multiple dataframes and then create a loop to work
Dear All Sorry for this simple question, I could not solve it by spending days. My data looks like this: # data set.seed(1234) clvar <- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100 level for this factor var; yvar <- rnorm(40, 10,6); var1 <- rnorm(40, 10,4); var2 <- rnorm(40, 10,4); var3 <- rnorm(40, 5, 2); var4 <- rnorm(40, 10, 3); var5 <- rnorm(40, 15, 8) # just example df <- data.frame(clvar, yvar, var1, var2, var3, var4, var5) # manual splitting df1 <- subset(df, clvar == 1) df2 <- subset(df, clvar == 2) df3<- subset(df, clvar == 3) df4<- subset(df, clvar == 4) df5<- subset(df, clvar == 5) # i tried to mechanize it * for(i in 1:5) { df[i] <- subset(df, clvar == i) } I know it should not work as df[i] is single variable, do it did. But I could not find away to output multiple dataframes from this loop. My limited R knowledge, did not help at all ! * # working on each of variable, just trying simple function a <- 3:8 out1 <- lapply(1:5, function(ind){ lm(df1$yvar ~ df1[, a[ind]]) }) p1 <- lapply(out1, function(m)summary(m)$coefficients[,4][2]) p1 <- do.call(rbind, p1) My ultimate objective is to apply this function to all the dataframes created (i.e. df1, df2, df3, df4, df5) and create five corresponding p-value vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and correponding p values clvar var1 var2 var3 var4 var5 1 2 3 4 Please help me ! Thanks NIL [[alternative HTML version deleted]]
Dennis Murphy
2011-Aug-29 18:54 UTC
[R] splitting into multiple dataframes and then create a loop to work
Hi: This is straightforward to do with the plyr package: # install.packages('plyr') library('plyr') set.seed(1234) df <- data.frame(clvar = rep(1:4, each = 10), yvar = rnorm(40, 10, 6), var1 = rnorm(40, 10, 4), var2 = rnorm(40, 10, 4), var3 = rnorm(40, 5, 2), var4 = rnorm(40, 10, 3), var5 = rnorm(40, 15, 8)) mods <- dlply(df, .(clvar), function(d) lm(yvar ~ . - clvar, data = d)) summary(mods[[1]]) mods is a list of model objects, one per subgroup defined by clvar. You can use extraction functions to pull out pieces from each model, e.g., ldply(mods, function(m) summary(m)[['r.squared']]) ldply(mods, function(m) coef(m)) ldply(mods, function(m) resid(m)) The dlply() function reads a data frame as input and outputs to a list; conversely, the ldply() function reads from a list and outputs to a data frame. The functions you call inside have to be compatible with the input and output data types. HTH, Dennis On Mon, Aug 29, 2011 at 8:37 AM, Nilaya Sharma <nilaya.sharma at gmail.com> wrote:> Dear All > > Sorry for this simple question, I could not solve it by spending days. > > My data looks like this: > > # data > set.seed(1234) > clvar <- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100 > level for this factor var; > yvar <- ?rnorm(40, 10,6); > var1 <- rnorm(40, 10,4); var2 <- rnorm(40, 10,4); var3 <- rnorm(40, 5, 2); > var4 <- rnorm(40, 10, 3); var5 <- rnorm(40, 15, 8) # just example > df <- data.frame(clvar, yvar, var1, var2, var3, var4, var5) > > # manual splitting > df1 <- subset(df, clvar == 1) > df2 <- subset(df, clvar == 2) > df3<- subset(df, clvar == 3) > df4<- subset(df, clvar == 4) > df5<- subset(df, clvar == 5) > > # i tried to mechanize it > * > > for(i in 1:5) { > > ? ? ? ? ?df[i] <- subset(df, clvar == i) > > } > > I know it should not work as df[i] is single variable, do it did. But I > could not find away to output multiple dataframes from this loop. My limited > R knowledge, did not help at all ! > > * > > # working on each of variable, just trying simple function > ?a <- 3:8 > out1 <- lapply(1:5, function(ind){ > ? ? ? ? ? ? ? ? ? lm(df1$yvar ~ df1[, a[ind]]) > ?}) > p1 <- lapply(out1, function(m)summary(m)$coefficients[,4][2]) > p1 <- do.call(rbind, p1) > > > My ultimate objective is to apply this function to all the dataframes > created (i.e. df1, df2, df3, df4, df5) and create five corresponding p-value > vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and > correponding p values > clvar ? ? ? var1 ? var2 ?var3 ?var4 ? var5 > 1 > 2 > 3 > 4 > > Please help me ! > > Thanks > > NIL > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dimitris Rizopoulos
2011-Aug-29 19:02 UTC
[R] splitting into multiple dataframes and then create a loop to work
You can do this using function lmList() from package nlme, without having to split the data frames, e.g., library(nlme) mlis <- lmList(yvar ~ . - clvar | clvar, data = df) mlis summary(mlis) I hope it helps. Best, Dimitris On 8/29/2011 5:37 PM, Nilaya Sharma wrote:> Dear All > > Sorry for this simple question, I could not solve it by spending days. > > My data looks like this: > > # data > set.seed(1234) > clvar<- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100 > level for this factor var; > yvar<- rnorm(40, 10,6); > var1<- rnorm(40, 10,4); var2<- rnorm(40, 10,4); var3<- rnorm(40, 5, 2); > var4<- rnorm(40, 10, 3); var5<- rnorm(40, 15, 8) # just example > df<- data.frame(clvar, yvar, var1, var2, var3, var4, var5) > > # manual splitting > df1<- subset(df, clvar == 1) > df2<- subset(df, clvar == 2) > df3<- subset(df, clvar == 3) > df4<- subset(df, clvar == 4) > df5<- subset(df, clvar == 5) > > # i tried to mechanize it > * > > for(i in 1:5) { > > df[i]<- subset(df, clvar == i) > > } > > I know it should not work as df[i] is single variable, do it did. But I > could not find away to output multiple dataframes from this loop. My limited > R knowledge, did not help at all ! > > * > > # working on each of variable, just trying simple function > a<- 3:8 > out1<- lapply(1:5, function(ind){ > lm(df1$yvar ~ df1[, a[ind]]) > }) > p1<- lapply(out1, function(m)summary(m)$coefficients[,4][2]) > p1<- do.call(rbind, p1) > > > My ultimate objective is to apply this function to all the dataframes > created (i.e. df1, df2, df3, df4, df5) and create five corresponding p-value > vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and > correponding p values > clvar var1 var2 var3 var4 var5 > 1 > 2 > 3 > 4 > > Please help me ! > > Thanks > > NIL > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
Apparently Analagous Threads
- How to get a specific named element in a nested list
- create data frame(s) from a list with different numbers of rows
- Reshaping data with xtabs giving me 'extra' data
- How to remove rows representing concurrent sessions from data.frame?
- merging dataframes with an unequal number of variables