Dear R users, I'm new to both R and to this list and would like to get advice on how to build generalized additive models in R. Based on the description of gam, which I found on the R website, I specified the following model: model1<-gam(ST~s(MOWST1),family=binomial,data=strikes.S), in which ST is my binary response variable and MOWST1 is a categorical independent variable. I get the following error message: Error in smooth.construct.tp.smooth.spec(object, data, knots) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning messages: 1: argument is not numeric or logical: returning NA in: mean.default(xx) 2: - not meaningful for factors in: Ops.factor(xx, shift[i]) I would greatly appreciate if someone could tell me what I did wrong. Can I use categorical independents in gam at all? Many thanks, Istvan
I.Szentirmai said the following on 2006-01-19 19:43: > Dear R users, > > I'm new to both R and to this list and would like to get > advice on how to build generalized additive models in R. > Based on the description of gam, which I found on the R Which `gam'? Note that R ships with package `mgcv' which has a `gam' function, but also package `gam' on CRAN has a `gam' function. (Furthermore, several other packages exists with functions that I'd categorize as GAM-fitters, e.g. SemiPar, assist, gss, gamlss, ...) > website, I specified the following model: > model1<-gam(ST~s(MOWST1),family=binomial,data=strikes.S), > in which ST is my binary response variable and MOWST1 is a > categorical independent variable. > > I get the following error message: > Error in smooth.construct.tp.smooth.spec(object, data, From this error message, I can however deduce that we're talking about the `mgcv::gam' function. > knots) : > NA/NaN/Inf in foreign function call (arg 1) > In addition: Warning messages: > 1: argument is not numeric or logical: returning NA in: > mean.default(xx) > 2: - not meaningful for factors in: Ops.factor(xx, > shift[i]) > > I would greatly appreciate if someone could tell me what I > did wrong. Can I use categorical independents in gam at > all? It's not clear to me what you mean by this. Yes, you can use factors in gam: gam(ST ~ MOWST1, family = binomial, data = strikes.S) would work. But you tried smoothing a factor, which isn't supported (and to me it doesn't make any sense doing so). Smoothing an ordered factor may make sense, but this is not supported (and you didn't try it, according to the error message above) by `mgcv'. I was under the impression that the `gam' function in package `gam' should be able to do this, but I just tried it and was rewarded by the error message "Error: 'codes' is defunct." relating to the internals of `gam' using a defunct R function -- I've e-mailed Prof Hastie, maintainer of package `gam', about this. Even if it worked, the `gam' package won't allow estimation of the degree of smoothness of the model terms as part of the fitting process. So if this is what you want in combination with ordered factors, you're probably out of luck. (You can always send Prof Wood, `mgcv' maintainer, a feature request.) HTH, Henric
> I'm new to both R and to this list and would like to get > advice on how to build generalized additive models in R. > Based on the description of gam, which I found on the R > website, I specified the following model: > model1<-gam(ST~s(MOWST1),family=binomial,data=strikes.S), > in which ST is my binary response variable and MOWST1 is a > categorical independent variable. > > I get the following error message: > Error in smooth.construct.tp.smooth.spec(object, data, > knots) : > NA/NaN/Inf in foreign function call (arg 1)- I guess this should maybe get trapped a bit earlier, so that you get a more informative warning. - The basic problem is that gams are based around sums of smooth functions of covariates. For the notion of smooth to be meaningful the covariates have to live in a space where you have at least a notion of distance between the covariates, since in some loose sense `smooth' means that f(x_1) must be close to f(x_2) if x_1 and x_2 are close. For factors you doen't generally have any notion of distance between the levels of a factor. (e.g. if a factor has levels "brick", "sky" and "purple", how far is it from "brick" to "purple"?) - Even if a factor is naturally ordered (e.g. "small", "medium", "large"), you would still have to decide on how to measure smoothness/wiggliness of a function of the factor. For this reason, I think that it is actually better to explicitly convert levels of an ordered factor into numeric values on a scale that you think is appropriate, before using the ordered factor as the covariate in a gam. In this way it's usually fairly easy to get one of the mgcv built in smoother classes to use the notion of smoothness that you think is appropriate: if not then it's not too hard to add a smoother class, following the template provided in ?p.spline (actually you could use this template to write a smoother class for ordered catagorical predictors). best, Simon