mviljamaa
2016-Oct-04 15:39 UTC
[R] What does lm() output coefficient mean when it's been given a categorical predictor of string values?
I'm using lm() for a model that has a predictor that has two values {poika, tytt?} (boy and girl in Finnish). I make a model with this categorical variable: fit1 <- lm(dta$X.U.FEFF..mpist. ~ dta$sukup + dta$HISEI + dta$SES) and while the variable/vector is here named as dta$sukup, what lm() returns is a coefficient dta$sukuptytt? -6.19756 What does the added 'tytt?' in the variable mean? Does it mean that 'tytt?' has been interpreted as 1 and 'poika' as 0?
David Winsemius
2016-Oct-04 16:18 UTC
[R] What does lm() output coefficient mean when it's been given a categorical predictor of string values?
> On Oct 4, 2016, at 8:39 AM, mviljamaa <mviljamaa at kapsi.fi> wrote: > > I'm using lm() for a model that has a predictor that has two values {poika, tytt?} (boy and girl in Finnish). > > I make a model with this categorical variable: > > fit1 <- lm(dta$X.U.FEFF..mpist. ~ dta$sukup + dta$HISEI + dta$SES) > > and while the variable/vector is here named as dta$sukup, what lm() returns is a coefficient > > dta$sukuptytt? > -6.19756 > > What does the added 'tytt?' in the variable mean? Does it mean that 'tytt?' has been interpreted as 1 and 'poika' as 0?Yes.> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
peter dalgaard
2016-Oct-04 16:45 UTC
[R] What does lm() output coefficient mean when it's been given a categorical predictor of string values?
> On 04 Oct 2016, at 17:39 , mviljamaa <mviljamaa at kapsi.fi> wrote: > > I'm using lm() for a model that has a predictor that has two values {poika, tytt?} (boy and girl in Finnish). > > I make a model with this categorical variable: > > fit1 <- lm(dta$X.U.FEFF..mpist. ~ dta$sukup + dta$HISEI + dta$SES) > > and while the variable/vector is here named as dta$sukup, what lm() returns is a coefficient > > dta$sukuptytt? > -6.19756 > > What does the added 'tytt?' in the variable mean? Does it mean that 'tytt?' has been interpreted as 1 and 'poika' as 0?Short answer: Yes. Long answer: Yes, if treatment contrast parametrization is being used. See help(contrasts) for a lead-in to an even longer answer. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Michael Dewey
2016-Oct-05 09:09 UTC
[R] What does lm() output coefficient mean when it's been given a categorical predictor of string values?
See inline On 04/10/2016 16:39, mviljamaa wrote:> I'm using lm() for a model that has a predictor that has two values > {poika, tytt?} (boy and girl in Finnish). > > I make a model with this categorical variable: > > fit1 <- lm(dta$X.U.FEFF..mpist. ~ dta$sukup + dta$HISEI + dta$SES) >You will find your code easier to read if you go fit1 <- lm(X.U.FEFF..mpist. ~ sukup + HISEI + SES, data = dta)> and while the variable/vector is here named as dta$sukup, what lm() > returns is a coefficient > > dta$sukuptytt? > -6.19756 > > What does the added 'tytt?' in the variable mean? Does it mean that > 'tytt?' has been interpreted as 1 and 'poika' as 0?If you would like it the other way round then see ?relevel> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Michael http://www.dewey.myzen.co.uk/home.html