Steve_Friedman at nps.gov
2010-Feb-05 18:10 UTC
[R] glm models with more than one response
Hi everyone, I am trying to construct a glm and am running into a couple of questions. The data set I am using consists of 6 categories for the response and 6 independent predictors representing nutrient concentrations at sample point locations. Ultimately I'd like to use the probabilities for each response category in a simulation model such that these probabilities are used to define a realized ecological niche. When I try the following it works for a single response. Typha.glm <- glm(fwc$VegType == "Cattail" ~ fwc$TP + fwc$TC + fwc$TN + fwc$BD + fwc$LOI + fwc$Total_Mg, family = poisson) But if I try this without specifying a specific VegType it fails. plants.glm <- glm(fwc$VegType ~ fwc$TP + fwc$TC + fwc$TN + fwc$BD + fwc$LOI + fwc$Total_Mg, family = poisson) Error in y + 0.1 : non-numeric argument to binary operator In addition: Warning message: In model.matrix.default(mt, mf, contrasts) : variable 'fwc$VegType' converted to a factor My questions are: 1. How can I extract the probability of the VegType for different concentrations of each of the independent parameters? 2. Do I need to run this model extracting for a specific VegType each time or is there a way to run the glm for each VegType in one statement? 3. I've experimented with binning each of the nutrient values to establish frequency values for each VegType, but am uncertain how to reconstruct that data into a common data set that can be submitted to the glm. Is this step even necessary ? Running on Windows XP R 2.10.1 Thanks and Much Appreciated Steve Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 Steve_Friedman at nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147
On Fri, 2010-02-05 at 13:10 -0500, Steve_Friedman at nps.gov wrote:> Hi everyone, > > I am trying to construct a glm and am running into a couple of questions. > > The data set I am using consists of 6 categories for the response and 6 > independent predictors representing nutrient concentrations at sample point > locations. Ultimately I'd like to use the probabilities for each response > category in a simulation model such that these probabilities are used to > define a realized ecological niche. > > When I try the following it works for a single response. > > Typha.glm <- glm(fwc$VegType == "Cattail" ~ fwc$TP + fwc$TC + fwc$TN + > fwc$BD + fwc$LOI + fwc$Total_Mg, family = poisson)By the way, you aren't really using the power of formulas with the above: Typha.glm <- glm(fwc$VegType == "Cattail" ~ TP + TC + TN + BD + LOI + Total_Mg, data = fwc, family = poisson) Depending on what model you are actually fitting, you could probably simplify the LHS of that formula too. Is that model appropriate with binary data --- the response is now 0/1 TRUE/FALSE? This sounds more like a multinomial type model to me. You might want to look at this recent paper in J Statistical Software by Thomas Yee: http://www.jstatsoft.org/v32/i10 The paper covers his VGAM software but does, IIRC, comment on R packages for fitting a wide range of categorical models. This isn't really my field, but you probably need to think about the nature of the response a bit more. May be follow up on R-SIG-Ecology? HTH G> > But if I try this without specifying a specific VegType it fails. > > plants.glm <- glm(fwc$VegType ~ fwc$TP + fwc$TC + fwc$TN + fwc$BD + > fwc$LOI + fwc$Total_Mg, family = poisson) > > Error in y + 0.1 : non-numeric argument to binary operator > In addition: Warning message: > In model.matrix.default(mt, mf, contrasts) : > variable 'fwc$VegType' converted to a factor > > My questions are: > > 1. How can I extract the probability of the VegType for different > concentrations of each of the independent parameters? > > 2. Do I need to run this model extracting for a specific VegType each time > or is there a way to run the glm for each VegType in one statement? > > 3. I've experimented with binning each of the nutrient values to establish > frequency values for each VegType, but am uncertain how to reconstruct that > data into a common data set that can be submitted to the glm. Is this step > even necessary ? > > Running on Windows XP > R 2.10.1 > > Thanks and Much Appreciated > > Steve > > Steve Friedman Ph. D. > Spatial Statistical Analyst > Everglades and Dry Tortugas National Park > 950 N Krome Ave (3rd Floor) > Homestead, Florida 33034 > > Steve_Friedman at nps.gov > Office (305) 224 - 4282 > Fax (305) 224 - 4147 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Hi Steve: On Fri, Feb 5, 2010 at 10:10 AM, <Steve_Friedman@nps.gov> wrote:> > Hi everyone, > > I am trying to construct a glm and am running into a couple of questions. > > The data set I am using consists of 6 categories for the response and 6 > independent predictors representing nutrient concentrations at sample point > locations. Ultimately I'd like to use the probabilities for each response > category in a simulation model such that these probabilities are used to > define a realized ecological niche. >Isn't this a multinomial logistic regression model? I don't think this will work with glm(). I'd look into the rms package of Frank Harrell's group (function lrm), the VGAM package and the mlogit package for what I think are more appropriate alternatives. If the response is unordered, then multinomial() in the nnet package is another alternative. Laura Thompson has been maintaining a book-length project showing how to use R/S-PLUS to do the exercises in Agresti's Categorical Data Analysis book. The latest version is here: https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf Chapter 7 is on multinomial logit models (unordered and ordered). HTH, Dennis [[alternative HTML version deleted]]