Dear friends, Hope you are doing great. I want to fit a logistic regression in R, where the dependent variable is the covid status (I used 1 for covid positives, and 0 for covid negatives), but when I ran the glm, R complains that I should make the dependent variable a factor. What would be more advisable, to keep the dependent variable with 1s and 0s, or code it as yes/no and then make it a factor? Any guidance will be greatly appreciated, Best regards, Paul [[alternative HTML version deleted]]
x <- factor(0:1) x <- factor("yes","no") will produce identical results up to labeling. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com> wrote:> Dear friends, > > Hope you are doing great. I want to fit a logistic regression in R, where > the dependent variable is the covid status (I used 1 for covid positives, > and 0 for covid negatives), but when I ran the glm, R complains that I > should make the dependent variable a factor. > > What would be more advisable, to keep the dependent variable with 1s and > 0s, or code it as yes/no and then make it a factor? > > Any guidance will be greatly appreciated, > > Best regards, > > Paul > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Sat, 1 Aug 2020, Paul Bernal wrote:> Hope you are doing great. I want to fit a logistic regression in R, where > the dependent variable is the covid status (I used 1 for covid positives, > and 0 for covid negatives), but when I ran the glm, R complains that I > should make the dependent variable a factor. > > What would be more advisable, to keep the dependent variable with 1s and > 0s, or code it as yes/no and then make it a factor?Paul, 1 or 0 are equivalent to yes or no, success or failure. All are nomminal variables so all should be factors, regardless of the coding. Rich
Hi Bert, Thank you for the kind reply. But what if I don't turn the variable into a factor. Let's say that in excel I just coded the variable as 1s and 0s and just imported the dataset into R and fitted the logistic regression without turning any categorical variable or dummy variable into a factor? Does R requires every dummy variable to be treated as a factor? Best regards, Paul El s?b., 1 de agosto de 2020 12:59 p. m., Bert Gunter < bgunter.4567 at gmail.com> escribi?:> x <- factor(0:1) > x <- factor("yes","no") > > will produce identical results up to labeling. > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 at gmail.com> > wrote: > >> Dear friends, >> >> Hope you are doing great. I want to fit a logistic regression in R, where >> the dependent variable is the covid status (I used 1 for covid positives, >> and 0 for covid negatives), but when I ran the glm, R complains that I >> should make the dependent variable a factor. >> >> What would be more advisable, to keep the dependent variable with 1s and >> 0s, or code it as yes/no and then make it a factor? >> >> Any guidance will be greatly appreciated, >> >> Best regards, >> >> Paul >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]
Hello, From the documentation, help('glm'): Details A typical predictor has the form|response ~ terms|where|response|is the (numeric) response vector and|terms|is a series of terms which specifies a linear predictor for|response|. For|binomial|and|quasibinomial|families the response can also be specified as a|factor <http://127.0.0.1:11611/library/stats/help/factor>|(when the first level denotes failure and all others success) or as a two-column matrix with the columns giving the numbers of successes and failures. A terms specification of the form|first + second|indicates all the terms in|first|together with all the terms in|second|with any duplicates removed. There is no need for the response to be a factor, it is optional, the wording is very clear, "For|binomial|and|quasibinomial|families the response *can* also be specified as a|factor <http://127.0.0.1:11611/library/stats/help/factor>"| And with binary, numeric responses I cannot reproduce the warning message, the models fit silently. Hope this helps, Rui Barradas ?s 18:39 de 01/08/2020, Paul Bernal escreveu:> Dear friends, > > Hope you are doing great. I want to fit a logistic regression in R, where > the dependent variable is the covid status (I used 1 for covid positives, > and 0 for covid negatives), but when I ran the glm, R complains that I > should make the dependent variable a factor. > > What would be more advisable, to keep the dependent variable with 1s and > 0s, or code it as yes/no and then make it a factor? > > Any guidance will be greatly appreciated, > > Best regards, > > Paul > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Este e-mail foi verificado em termos de v?rus pelo software antiv?rus Avast. https://www.avast.com/antivirus
On 2/08/20 5:39 am, Paul Bernal wrote:> Dear friends, > > Hope you are doing great. I want to fit a logistic regression in R, where > the dependent variable is the covid status (I used 1 for covid positives, > and 0 for covid negatives), but when I ran the glm, R complains that I > should make the dependent variable a factor. > > What would be more advisable, to keep the dependent variable with 1s and > 0s, or code it as yes/no and then make it a factor? > > Any guidance will be greatly appreciated,There have been many responses to this post, the majority of them being confusing and off the point. BOTTOM LINE: R/glm() does *NOT* complain that one "should make the dependent variable a factor". This is bovine faecal output. As Rui Barradas has pointed out (alternatively: RTFM!) when you fit a Bernoulli model using glm(), your response/dependent variable is allowed to be * a numeric variable with values 0 or 1 * a logical variable * a factor with two levels The OP presumably fed glm() a *character* vector with values "0" and "1". Doing *this* will cause glm() to whinge. I reiterate: RTFM!!! (And perhaps learn to distinguish between character vectors and factors.) cheers, Rolf Turner -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
That's a bit harsh. Isn't the best advice here, to post a reproducible example... Which I believe has been mentioned. Also, I'd strongly encourage people to use package+function name, for this sort of thing. stats::glm As there are many R functions for GLMs... On Sun, Aug 2, 2020 at 12:47 PM Rolf Turner <r.turner at auckland.ac.nz> wrote:> > > On 2/08/20 5:39 am, Paul Bernal wrote: > > > Dear friends, > > > > Hope you are doing great. I want to fit a logistic regression in R, where > > the dependent variable is the covid status (I used 1 for covid positives, > > and 0 for covid negatives), but when I ran the glm, R complains that I > > should make the dependent variable a factor. > > > > What would be more advisable, to keep the dependent variable with 1s and > > 0s, or code it as yes/no and then make it a factor? > > > > Any guidance will be greatly appreciated, > > > There have been many responses to this post, the majority of them being > confusing and off the point. > > BOTTOM LINE: R/glm() does *NOT* complain that one "should make the > dependent variable a factor". This is bovine faecal output. > > As Rui Barradas has pointed out (alternatively: RTFM!) when you fit a > Bernoulli model using glm(), your response/dependent variable is allowed > to be > > * a numeric variable with values 0 or 1 > * a logical variable > * a factor with two levels > > The OP presumably fed glm() a *character* vector with values "0" and > "1". Doing *this* will cause glm() to whinge. > > I reiterate: RTFM!!! (And perhaps learn to distinguish between > character vectors and factors.) > > cheers, > > Rolf Turner > > -- > Honorary Research Fellow > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.