Nilaya Sharma
2011-Aug-29 15:37 UTC
[R] splitting into multiple dataframes and then create a loop to work
Dear All
Sorry for this simple question, I could not solve it by spending days.
My data looks like this:
# data
set.seed(1234)
clvar <- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
level for this factor var;
yvar <- rnorm(40, 10,6);
var1 <- rnorm(40, 10,4); var2 <- rnorm(40, 10,4); var3 <- rnorm(40, 5,
2);
var4 <- rnorm(40, 10, 3); var5 <- rnorm(40, 15, 8) # just example
df <- data.frame(clvar, yvar, var1, var2, var3, var4, var5)
# manual splitting
df1 <- subset(df, clvar == 1)
df2 <- subset(df, clvar == 2)
df3<- subset(df, clvar == 3)
df4<- subset(df, clvar == 4)
df5<- subset(df, clvar == 5)
# i tried to mechanize it
*
for(i in 1:5) {
df[i] <- subset(df, clvar == i)
}
I know it should not work as df[i] is single variable, do it did. But I
could not find away to output multiple dataframes from this loop. My limited
R knowledge, did not help at all !
*
# working on each of variable, just trying simple function
a <- 3:8
out1 <- lapply(1:5, function(ind){
lm(df1$yvar ~ df1[, a[ind]])
})
p1 <- lapply(out1, function(m)summary(m)$coefficients[,4][2])
p1 <- do.call(rbind, p1)
My ultimate objective is to apply this function to all the dataframes
created (i.e. df1, df2, df3, df4, df5) and create five corresponding p-value
vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
correponding p values
clvar var1 var2 var3 var4 var5
1
2
3
4
Please help me !
Thanks
NIL
[[alternative HTML version deleted]]
Dennis Murphy
2011-Aug-29 18:54 UTC
[R] splitting into multiple dataframes and then create a loop to work
Hi:
This is straightforward to do with the plyr package:
# install.packages('plyr')
library('plyr')
set.seed(1234)
df <- data.frame(clvar = rep(1:4, each = 10), yvar = rnorm(40, 10, 6),
var1 = rnorm(40, 10, 4), var2 = rnorm(40, 10, 4),
var3 = rnorm(40, 5, 2), var4 = rnorm(40, 10, 3),
var5 = rnorm(40, 15, 8))
mods <- dlply(df, .(clvar), function(d) lm(yvar ~ . - clvar, data = d))
summary(mods[[1]])
mods is a list of model objects, one per subgroup defined by clvar.
You can use extraction functions to pull out pieces from each model,
e.g.,
ldply(mods, function(m) summary(m)[['r.squared']])
ldply(mods, function(m) coef(m))
ldply(mods, function(m) resid(m))
The dlply() function reads a data frame as input and outputs to a
list; conversely, the ldply() function reads from a list and outputs
to a data frame. The functions you call inside have to be compatible
with the input and output data types.
HTH,
Dennis
On Mon, Aug 29, 2011 at 8:37 AM, Nilaya Sharma <nilaya.sharma at
gmail.com> wrote:> Dear All
>
> Sorry for this simple question, I could not solve it by spending days.
>
> My data looks like this:
>
> # data
> set.seed(1234)
> clvar <- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100
> level for this factor var;
> yvar <- ?rnorm(40, 10,6);
> var1 <- rnorm(40, 10,4); var2 <- rnorm(40, 10,4); var3 <-
rnorm(40, 5, 2);
> var4 <- rnorm(40, 10, 3); var5 <- rnorm(40, 15, 8) # just example
> df <- data.frame(clvar, yvar, var1, var2, var3, var4, var5)
>
> # manual splitting
> df1 <- subset(df, clvar == 1)
> df2 <- subset(df, clvar == 2)
> df3<- subset(df, clvar == 3)
> df4<- subset(df, clvar == 4)
> df5<- subset(df, clvar == 5)
>
> # i tried to mechanize it
> *
>
> for(i in 1:5) {
>
> ? ? ? ? ?df[i] <- subset(df, clvar == i)
>
> }
>
> I know it should not work as df[i] is single variable, do it did. But I
> could not find away to output multiple dataframes from this loop. My
limited
> R knowledge, did not help at all !
>
> *
>
> # working on each of variable, just trying simple function
> ?a <- 3:8
> out1 <- lapply(1:5, function(ind){
> ? ? ? ? ? ? ? ? ? lm(df1$yvar ~ df1[, a[ind]])
> ?})
> p1 <- lapply(out1, function(m)summary(m)$coefficients[,4][2])
> p1 <- do.call(rbind, p1)
>
>
> My ultimate objective is to apply this function to all the dataframes
> created (i.e. df1, df2, df3, df4, df5) and create five corresponding
p-value
> vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and
> correponding p values
> clvar ? ? ? var1 ? var2 ?var3 ?var4 ? var5
> 1
> 2
> 3
> 4
>
> Please help me !
>
> Thanks
>
> NIL
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Dimitris Rizopoulos
2011-Aug-29 19:02 UTC
[R] splitting into multiple dataframes and then create a loop to work
You can do this using function lmList() from package nlme, without having to split the data frames, e.g., library(nlme) mlis <- lmList(yvar ~ . - clvar | clvar, data = df) mlis summary(mlis) I hope it helps. Best, Dimitris On 8/29/2011 5:37 PM, Nilaya Sharma wrote:> Dear All > > Sorry for this simple question, I could not solve it by spending days. > > My data looks like this: > > # data > set.seed(1234) > clvar<- c( rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10)) # I have 100 > level for this factor var; > yvar<- rnorm(40, 10,6); > var1<- rnorm(40, 10,4); var2<- rnorm(40, 10,4); var3<- rnorm(40, 5, 2); > var4<- rnorm(40, 10, 3); var5<- rnorm(40, 15, 8) # just example > df<- data.frame(clvar, yvar, var1, var2, var3, var4, var5) > > # manual splitting > df1<- subset(df, clvar == 1) > df2<- subset(df, clvar == 2) > df3<- subset(df, clvar == 3) > df4<- subset(df, clvar == 4) > df5<- subset(df, clvar == 5) > > # i tried to mechanize it > * > > for(i in 1:5) { > > df[i]<- subset(df, clvar == i) > > } > > I know it should not work as df[i] is single variable, do it did. But I > could not find away to output multiple dataframes from this loop. My limited > R knowledge, did not help at all ! > > * > > # working on each of variable, just trying simple function > a<- 3:8 > out1<- lapply(1:5, function(ind){ > lm(df1$yvar ~ df1[, a[ind]]) > }) > p1<- lapply(out1, function(m)summary(m)$coefficients[,4][2]) > p1<- do.call(rbind, p1) > > > My ultimate objective is to apply this function to all the dataframes > created (i.e. df1, df2, df3, df4, df5) and create five corresponding p-value > vectors (p1, p2, p3, p4, p5). Then output would be a matrix of clvar and > correponding p values > clvar var1 var2 var3 var4 var5 > 1 > 2 > 3 > 4 > > Please help me ! > > Thanks > > NIL > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
Reasonably Related Threads
- How to get a specific named element in a nested list
- create data frame(s) from a list with different numbers of rows
- Reshaping data with xtabs giving me 'extra' data
- How to remove rows representing concurrent sessions from data.frame?
- merging dataframes with an unequal number of variables