Claus O'Rourke
2011-Mar-15 13:33 UTC
[R] Newbie-ish question on iteratively applying function to dataframe
Hi, I am trying to recursively apply a function to a selection of columns in a dataframe. I've had a look around and from what I have read, I should be using some version of the apply function, but I'm really having some headaches with it. Let me be more specific with an example. Say I have a data frame similar to the following A x y z r1 r2 r3 r4 0.1 0.2 0.1 ... 0.1 0.3 ... 0.2 ... i.e., a number of columns, each of the same length, and all containing real numbers. Of these columns, I want to model one variable, say A, as a function of other variables, say x, y, z, and any one of my r1, r2, r3, ... variables. i.e., I want to model A ~ x + y + z + r1 A ~ x + y + z + r2 .... A ~ x + y + z + rn But where the number of 'r' variables I will have will be large, and I don't know the specific number of these variables in advance. My question first is, how can I select all the columns in a dataframe that have a heading that matches a string pattern? And then related to this, what would be the best way of repeatedly applying my modelling function to the result? Many thanks for any help for this occasional R armature. Claus
Ista Zahn
2011-Mar-15 15:46 UTC
[R] Newbie-ish question on iteratively applying function to dataframe
Hi Claus, On Tue, Mar 15, 2011 at 9:33 AM, Claus O'Rourke <claus.orourke at gmail.com> wrote:> Hi, > I am trying to recursively apply a function to a selection of columns > in a dataframe. I've had a look around and from what I have read, I > should be using some version of the apply function, but I'm really > having some headaches with it.I would just do it in a loop (see below)> > Let me be more specific with an example. > > Say I have a data frame similar to the following > > A ? ? x ? ? y ? ? z ? ? r1 ? ?r2 ? ?r3 ? ?r4 > 0.1 ?0.2 ?0.1 ... > 0.1 ?0.3 ... > 0.2 ... > > i.e., a number of columns, each of the same length, and all containing > real numbers. Of these columns, I want to model one variable, say A, > as a function of other variables, say x, y, z, and any one of my r1, > r2, r3, ... variables. > > i.e., I want to model > A ~ x + y + z + r1 > A ~ x + y + z + r2 > .... > A ~ x + y + z + rn > > But where the number of 'r' variables I will have will be large, and I > don't know the specific number of these variables in advance. > > My question first is, how can I select all the columns in a dataframe > that have a heading that matches a string pattern??grep> > And then related to this, what would be the best way of repeatedly > applying my modelling function to the result?Well, I don't know about the "best" way. But why not just set.seed(21 ) dat <- as.data.frame(matrix(rnorm(100000 ), ncol=100, dimnames=list (1:1000, c("A", "x", "y", "z", paste("r", 1:96, sep="" ))))) mods <- list() for(i in grep("r", names(dat ), value=TRUE)) { mods[[i]] <- lm(as.formula(paste("A ~ x + y + z + ", i)), data=dat ) } Note that you should be cautious about making any inferences based on this kind of method. In the example above 9 r variables are "significant" at the .05 level, even though the data was generated "randomly": sort(sapply(mods, function(x) coef(summary(x))[5, 4])) Best, Ista> > Many thanks for any help for this occasional R armature. > > Claus > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org