That's a bit harsh. Isn't the best advice here, to post a reproducible example... Which I believe has been mentioned. Also, I'd strongly encourage people to use package+function name, for this sort of thing. stats::glm As there are many R functions for GLMs... On Sun, Aug 2, 2020 at 12:47 PM Rolf Turner <r.turner at auckland.ac.nz> wrote:> > > On 2/08/20 5:39 am, Paul Bernal wrote: > > > Dear friends, > > > > Hope you are doing great. I want to fit a logistic regression in R, where > > the dependent variable is the covid status (I used 1 for covid positives, > > and 0 for covid negatives), but when I ran the glm, R complains that I > > should make the dependent variable a factor. > > > > What would be more advisable, to keep the dependent variable with 1s and > > 0s, or code it as yes/no and then make it a factor? > > > > Any guidance will be greatly appreciated, > > > There have been many responses to this post, the majority of them being > confusing and off the point. > > BOTTOM LINE: R/glm() does *NOT* complain that one "should make the > dependent variable a factor". This is bovine faecal output. > > As Rui Barradas has pointed out (alternatively: RTFM!) when you fit a > Bernoulli model using glm(), your response/dependent variable is allowed > to be > > * a numeric variable with values 0 or 1 > * a logical variable > * a factor with two levels > > The OP presumably fed glm() a *character* vector with values "0" and > "1". Doing *this* will cause glm() to whinge. > > I reiterate: RTFM!!! (And perhaps learn to distinguish between > character vectors and factors.) > > cheers, > > Rolf Turner > > -- > Honorary Research Fellow > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Martin Maechler
2020-Aug-03 07:25 UTC
[R] [FORGED] Dependent Variable in Logistic Regression
>>>>> Abby Spurdle >>>>> on Sun, 2 Aug 2020 15:13:51 +1200 writes:> That's a bit harsh. Isn't the best advice here, to post a > reproducible example... Which I believe has been > mentioned. > Also, I'd strongly encourage people to use > package+function name, for this sort of thing. > stats::glm > As there are many R functions for GLMs... Sorry, Abby, I do disagree here ((strongly enough as to warrant this reply) : We're talking about doing "basic" statistics with R, and these function in the stats package have been part of R even before got a version number. So, no, glm() {and the stats package} are the default and I still think everybody should know and assume that. Martin > On Sun, Aug 2, 2020 at 12:47 PM Rolf Turner > <r.turner at auckland.ac.nz> wrote: >> >> >> On 2/08/20 5:39 am, Paul Bernal wrote: >> >> > Dear friends, >> > >> > Hope you are doing great. I want to fit a logistic >> regression in R, where > the dependent variable is the >> covid status (I used 1 for covid positives, > and 0 for >> covid negatives), but when I ran the glm, R complains >> that I > should make the dependent variable a factor. >> > >> > What would be more advisable, to keep the dependent >> variable with 1s and > 0s, or code it as yes/no and then >> make it a factor? >> > >> > Any guidance will be greatly appreciated, >> >> >> There have been many responses to this post, the majority >> of them being confusing and off the point. >> >> BOTTOM LINE: R/glm() does *NOT* complain that one "should >> make the dependent variable a factor". This is bovine >> faecal output. >> >> As Rui Barradas has pointed out (alternatively: RTFM!) >> when you fit a Bernoulli model using glm(), your >> response/dependent variable is allowed to be >> >> * a numeric variable with values 0 or 1 * a logical >> variable * a factor with two levels >> >> The OP presumably fed glm() a *character* vector with >> values "0" and "1". Doing *this* will cause glm() to >> whinge. >> >> I reiterate: RTFM!!! (And perhaps learn to distinguish >> between character vectors and factors.) >> >> cheers, >> >> Rolf Turner >> >> -- >> Honorary Research Fellow Department of Statistics >> University of Auckland Phone: +64-9-373-7599 ext. 88276 >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and >> more, see https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide >> commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and > more, see https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code.
> Sorry, Abby, I do disagree here ((strongly enough as to warrant > this reply) :Which part are you disagreeing with? That unambiquous names/references should be used, or that there are many R functions for GLMs. The wording of your post, suggests (kind of), that there is only one R function for GLMs.> We're talking about doing "basic" statistics with R, and these > function in the stats package have been part of R even before > got a version number.Remember, not everyone is using the same R packages, as you. And some people have done university courses, or done online courses, etc, in R, without ever using one function from the stats package. I'm reluctant to assume that all R users will have a common understanding. And what may seem obvious to you or me, may seem quite foreign to some users, or vice versa.> So, no, glm() {and the stats package} are the default and I still > think everybody should know and assume that.But perhaps most importantly, the OP said "the glm". He never said "glm()", but rather the subsequent posters did. Rolf suggested his post was bullshit, after removing the lexical peroxide. How does anyone know that it wasn't a genuine post, but in reference to something other than stats::glm? Shouldn't people be innocent until proven guilty. Otherwise (something I have been guilty of in the past), the mailing list turns into statistical propaganda... Even if the OP was referring to stats::glm, I'm still inclined to feel the post was legitimate, just a bit short on technical details...