Hi All, I am trying to fit my data with glm model, my data is a matrix of size n*100. So, I have n rows and 100 columns and my vector y is of size n which contains the labels (0 or 1) My question is: instead of manually typing the model as glm.fit = glm(y~ x[,1]+x[,2]+...+x[,100], family=binomial()) I have a for loop as follows that concatenates the x variables as follows: final_str=NULL for (m in 1:100){ str = paste(x[,m],+,sep="") final_str= paste(final_str,str,sep="") } glm.fit = flm(y~final_str,family=binomial()) but final_str is treated as a string and it does not work. Could you please help me with fixing that? Thanks a lot, Andra
Hi Andra, There are several problems with what you are doing (by the way, I point them out so you can learn and improve, not to be harsh or rude). The good news is there is a solution (#3) that is easier than what you are doing right now! 1) glm.fit() is a function so it is a good idea not to use it as a variable 2) You are looping through your variables, when you could avoid the loop and use: paste(x, collapse = " + ") for example with the first ten letters of the alphabet:> paste(LETTERS[1:10], collapse = " + ")[1] "A + B + C + D + E + F + G + H + I + J" 3) If you store your data in a data frame like: dat <- as.data.frame(cbind(Y = y, x)) you do not need to do anything other than: glm(Y ~ ., data = dat, family = binomial) because R will expand the "." to be every variable in the dataset that is not the outcome. This would be my recommendation. 4) If you really wanted to use your pasted string, try it like this: f <- "mpg ~ hp" # create formula as string lm(as.formula(f), data = mtcars) # convert to formula and use in model although there are many variants of this some of which may be better. Still, I would recommend #3 in your case over #4. I hope this helps, Josh On Mon, Aug 22, 2011 at 9:43 PM, Andra Isan <andra_isan at yahoo.com> wrote:> Hi All, > > I am trying to fit my data with glm model, my data is a matrix of size n*100. So, I have n rows and 100 columns and my vector y is of size n which contains the labels (0 or 1) > > My question is: > instead of manually typing the model as > ?glm.fit = glm(y~ x[,1]+x[,2]+...+x[,100], family=binomial()) > > I have a for loop as follows that concatenates the x variables as follows: > > final_str=NULL > for (m in 1:100){ > str = paste(x[,m],+,sep="") > final_str= paste(final_str,str,sep="") > } > > glm.fit = flm(y~final_str,family=binomial()) > but final_str is treated as a string and it does not work. Could you please help me with fixing that? > > Thanks a lot, > Andra > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
Thanks a lot Joshua. That clearly solved my problem. I actually tried number 3 and it works perfectly fine. I used the prediction function as follows: pred= predict(glm.fit,data = dat, type="response") (glm.fit is my fitted model) to predict how it predicts on my whole data but obviously I have to do cross-validation to train the model on one part of my data and predict on the other part. So, I searched for it and I found a function cv.glm which is in package boot. So, I tired to use it as: cv.glm = (cv.glm(dat, glm.fit, cost, K=nrow(dat))$delta) but I am not sure how to do the prediction for the hold-out data. Is there any better way for cross-validation to learn a model on training data and test it on test data in R? Thanks, Andra --- On Mon, 8/22/11, Joshua Wiley <jwiley.psych at gmail.com> wrote:> From: Joshua Wiley <jwiley.psych at gmail.com> > Subject: Re: [R] GLM question > To: "Andra Isan" <andra_isan at yahoo.com> > Cc: r-help at r-project.org > Date: Monday, August 22, 2011, 9:54 PM > Hi Andra, > > There are several problems with what you are doing (by the > way, I > point them out so you can learn and improve, not to be > harsh or rude). > The good news is there is a solution (#3) that is easier > than what > you are doing right now! > > 1) glm.fit() is a function so it is a good idea not to use > it as a variable > > 2) You are looping through your variables, when you could > avoid the > loop and use: > ? paste(x, collapse = " + ") > > for example with the first ten letters of the alphabet: > > > paste(LETTERS[1:10], collapse = " + ") > [1] "A + B + C + D + E + F + G + H + I + J" > > 3) If you store your data in a data frame like: > > dat <- as.data.frame(cbind(Y = y, x)) > > you do not need to do anything other than: > > glm(Y ~ ., data = dat, family = binomial) > > because R will expand the "." to be every variable in the > dataset that > is not the outcome.? This would be my recommendation. > > 4) If you really wanted to use your pasted string, try it > like this: > > f <- "mpg ~ hp" # create formula as string > lm(as.formula(f), data = mtcars) # convert to formula and > use in model > > although there are many variants of this some of which may > be better. > Still, I would recommend #3 in your case over #4. > > I hope this helps, > > Josh > > On Mon, Aug 22, 2011 at 9:43 PM, Andra Isan <andra_isan at yahoo.com> > wrote: > > Hi All, > > > > I am trying to fit my data with glm model, my data is > a matrix of size n*100. So, I have n rows and 100 columns > and my vector y is of size n which contains the labels (0 or > 1) > > > > My question is: > > instead of manually typing the model as > > ?glm.fit = glm(y~ x[,1]+x[,2]+...+x[,100], > family=binomial()) > > > > I have a for loop as follows that concatenates the x > variables as follows: > > > > final_str=NULL > > for (m in 1:100){ > > str = paste(x[,m],+,sep="") > > final_str= paste(final_str,str,sep="") > > } > > > > glm.fit = flm(y~final_str,family=binomial()) > > but final_str is treated as a string and it does not > work. Could you please help me with fixing that? > > > > Thanks a lot, > > Andra > > > > ______________________________________________ > > R-help at r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, ATS Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/ >
The minimum achievable level of significance is defined asthe minimum of Prob(Y=y) over all y's. If I have GLM with a treatment and replicate and I would like to find out how to compute the minimum achievable level of significance for that GLM in R For example, how do I do this for the following data: Treat_Rep1 Treat_Rep2 Control_Rep1 Control_Rep2 4 10 2 8 -- Thanks, Jim. [[alternative HTML version deleted]]
Seemingly Similar Threads
- How to fit my data with a distribution?
- How to find the accuracy of the predicted glm model with family = binomial (link = logit)
- a Question regarding glm for linear regression
- Grouping variables in a data frame
- Question about BIC of two different regression models? how should we compare two regression models?