Philip Robinson
2012-Mar-01 00:42 UTC
[R] identifying a column name correctly to use in a formula
Hi, I have a large matrix (SNPs) that I want to cycle over with logistic regression with interaction terms. I have made a loop but I am struggling to identify to the formula the name of the column in a way which is meaningful to the formula. It errors becasue it is not evaluated proporly. (below is a pilot with only 7 to 33 columns, my actual has 200,000 columns) My attempts: for (i in 7:33) { label <- colnames(n)[i] model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) X <- summary(model1)$coefficients[2,1] Y <- c(label,X) vector <- rbind(vector,Y) } #variable lengths differ Error in model.frame.default(formula = AS ~ label, data = n, drop.unused.levels = TRUE) : variable lengths differ (found for 'label') #This is because it is trying to do logistic regression on a character string for (i in 7:33) { label <- eval(colnames(n)[i]) model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) X <- summary(model1)$coefficients[2,1] Y <- c(label,X) vector <- rbind(vector,Y) } #variable lengths differ Error in model.frame.default(formula = AS ~ label, data = n, drop.unused.levels = TRUE) : variable lengths differ (found for 'label') #same as above for (i in 7:33) { label <- as.name(colnames(n)[i]) model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) X <- summary(model1)$coefficients[2,1] Y <- c(label,X) vector <- rbind(vector,Y) } Error in model.frame.default(formula = AS ~ label, data = n, drop.unused.levels = TRUE) : invalid type (symbol) for variable 'label #not sure what this error is for (i in 7:33) { label <- eval(as.name(colnames(n)[i])) model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) X <- summary(model1)$coefficients[2,1] Y <- c(label,X) vector <- rbind(vector,Y) } # Error in eval(expr, envir, enclos) : object 'B1' not found B1 is the name of the first column - this isn't an object and that seems to be why it is causing an error for (i in 7:33) { label <- as.formula(colnames(n)[i]) model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) X <- summary(model1)$coefficients[2,1] Y <- c(label,X) vector <- rbind(vector,Y) } Error in eval(expr, envir, enclos) : object 'B1' not found #same as above for (i in 7:33) { label <- eval(as.formula(colnames(n)[i])) model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) X <- summary(model1)$coefficients[2,1] Y <- c(label,X) vector <- rbind(vector,Y) } Error in eval(expr, envir, enclos) : object 'B1' not found #same as above Any help would be appreciated. Thanks Philip [[alternative HTML version deleted]]
Rui Barradas
2012-Mar-01 03:24 UTC
[R] identifying a column name correctly to use in a formula
Hello,> > I have a large matrix (SNPs) that I want to cycle over with logistic > regression with interaction terms. I have made a loop but I am struggling > to identify to the formula the name of the column in a way which is > meaningful to the formula. It errors becasue it is not evaluated proporly. > You have must first write the formula in full, using 'paste'. >Try DF <- data.frame(Resp=rnorm(10), B=rnorm(10), C=rnorm(10), Interaction=rnorm(10)) #DF for(i in 2:3){ cname <- colnames(DF)[i] # # In 3 steps to be more readable Regr <- paste(cname, "Interaction", sep="*") fmlaText <- paste("Resp", Regr, sep="~") # After step 2 it's already printable print(fmlaText) # Step 3: transform it into a formula object fmla <- as.formula(fmlaText) model1 <- glm(fmla, data=DF) print(summary(model1)) } You have must first write the formula in full, using 'paste'. Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/identifying-a-column-name-correctly-to-use-in-a-formula-tp4433605p4433924.html Sent from the R help mailing list archive at Nabble.com.
R. Michael Weylandt
2012-Mar-01 04:04 UTC
[R] identifying a column name correctly to use in a formula
Your method of constructing a formula is funny: is there a term called "interaction" or do you mean an interaction in the statistical sense? Once you do that, I'd think the easiest way to proceed is to use as.formula() to construct your formula programmatically and then to pass that to glm(). Something like form <- as.formula(paste("AS ~ ", colnames(n)[i], sep = "")) glm(form, data = n, framily = bonimial("logit") Michael On Wed, Feb 29, 2012 at 7:42 PM, Philip Robinson <philip.c.robinson at gmail.com> wrote:> Hi, > > I have a large matrix (SNPs) that I want to cycle over with logistic > regression with interaction terms. I have made a loop but I am struggling > to identify to the formula the name of the column in a way which is > meaningful to the formula. It errors becasue it is not evaluated proporly. > > (below is a pilot with only 7 to 33 columns, my actual has 200,000 columns) > > My attempts: > > > for (i in 7:33) { > ?label <- colnames(n)[i] > model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) > ? ?X <- summary(model1)$coefficients[2,1] > Y <- c(label,X) > vector <- rbind(vector,Y) > } #variable lengths differ > > Error in model.frame.default(formula = AS ~ label, data = n, > drop.unused.levels = TRUE) : > ?variable lengths differ (found for 'label') > > #This is because it is trying to do logistic regression on a character > string > > for (i in 7:33) { > ?label <- eval(colnames(n)[i]) > model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) > ? ?X <- summary(model1)$coefficients[2,1] > Y <- c(label,X) > vector <- rbind(vector,Y) > } #variable lengths differ > > Error in model.frame.default(formula = AS ~ label, data = n, > drop.unused.levels = TRUE) : > ?variable lengths differ (found for 'label') > > #same as above > > for (i in 7:33) { > ?label <- as.name(colnames(n)[i]) > model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) > ? ?X <- summary(model1)$coefficients[2,1] > Y <- c(label,X) > vector <- rbind(vector,Y) > } > > Error in model.frame.default(formula = AS ~ label, data = n, > drop.unused.levels = TRUE) : > ?invalid type (symbol) for variable 'label > #not sure what this error is > > for (i in 7:33) { > ?label <- eval(as.name(colnames(n)[i])) > model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) > ? ?X <- summary(model1)$coefficients[2,1] > Y <- c(label,X) > vector <- rbind(vector,Y) > } > > # Error in eval(expr, envir, enclos) : object 'B1' not found > B1 is the name of the first column - this isn't an object and that seems to > be why it is causing an error > > for (i in 7:33) { > ?label <- as.formula(colnames(n)[i]) > model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) > ? ?X <- summary(model1)$coefficients[2,1] > Y <- c(label,X) > vector <- rbind(vector,Y) > } > Error in eval(expr, envir, enclos) : object 'B1' not found > > #same as above > > for (i in 7:33) { > ?label <- eval(as.formula(colnames(n)[i])) > model1 <- glm(AS~label*interaction,family=binomial("logit"),data=n) > ? ?X <- summary(model1)$coefficients[2,1] > Y <- c(label,X) > vector <- rbind(vector,Y) > } > > Error in eval(expr, envir, enclos) : object 'B1' not found > #same as above > > Any help would be appreciated. > > Thanks > Philip > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.