I'm trying to develop a linear model for crop productivity based on variables published as part of the SSURGO database released by the USDA. My default is to just run lm() with continuous predictor variables as numeric, and discrete predictor variables as factors, but some of the discrete variables are ordinal (e.g. drainage class, which ranges from excessively drained to excessively poorly drained), but this doesn't make use of the fact that the predictor variables have a known order. How do I correctly set up a regression model (with lm or similar) to detect the influence of ordinal variables? How will the output differ compared to the dummy variable outputs for unordered categorical variables. Thanks, Alex
I would consider this is a question for a statistics forum such as stats.stackexchange.com, not R-help, which is about R programming. They do sometimes intersect, as here, but I think you need to *understand what you're doing* before you write the R code to do it. Obviously, IMO. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Oct 5, 2017 at 10:54 AM, Alexandra Thorn <alexandra.thorn at gmail.com> wrote:> I'm trying to develop a linear model for crop productivity based on > variables published as part of the SSURGO database released by the > USDA. My default is to just run lm() with continuous predictor > variables as numeric, and discrete predictor variables as factors, but > some of the discrete variables are ordinal (e.g. drainage class, which > ranges from excessively drained to excessively poorly drained), but > this doesn't make use of the fact that the predictor variables have a > known order. > > How do I correctly set up a regression model (with lm or similar) to > detect the influence of ordinal variables? > > How will the output differ compared to the dummy variable outputs for > unordered categorical variables. > > Thanks, > Alex > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
This article may be helpful, at least to get you started: https://www.r-bloggers.com/ordinal-data/ Cheers, Boris> On Oct 5, 2017, at 3:35 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > I would consider this is a question for a statistics forum such as > stats.stackexchange.com, not R-help, which is about R programming. They do > sometimes intersect, as here, but I think you need to *understand what > you're doing* before you write the R code to do it. > > Obviously, IMO. > > Cheers, > Bert > > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Thu, Oct 5, 2017 at 10:54 AM, Alexandra Thorn <alexandra.thorn at gmail.com> > wrote: > >> I'm trying to develop a linear model for crop productivity based on >> variables published as part of the SSURGO database released by the >> USDA. My default is to just run lm() with continuous predictor >> variables as numeric, and discrete predictor variables as factors, but >> some of the discrete variables are ordinal (e.g. drainage class, which >> ranges from excessively drained to excessively poorly drained), but >> this doesn't make use of the fact that the predictor variables have a >> known order. >> >> How do I correctly set up a regression model (with lm or similar) to >> detect the influence of ordinal variables? >> >> How will the output differ compared to the dummy variable outputs for >> unordered categorical variables. >> >> Thanks, >> Alex >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Try looking at the help page for factor ?factor for something to start with. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 On 10/5/17, 10:54 AM, "R-help on behalf of Alexandra Thorn" <r-help-bounces at r-project.org on behalf of alexandra.thorn at gmail.com> wrote: I'm trying to develop a linear model for crop productivity based on variables published as part of the SSURGO database released by the USDA. My default is to just run lm() with continuous predictor variables as numeric, and discrete predictor variables as factors, but some of the discrete variables are ordinal (e.g. drainage class, which ranges from excessively drained to excessively poorly drained), but this doesn't make use of the fact that the predictor variables have a known order. How do I correctly set up a regression model (with lm or similar) to detect the influence of ordinal variables? How will the output differ compared to the dummy variable outputs for unordered categorical variables. Thanks, Alex ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.