I think it's going to be a problem to have different sized groups in
your second model. ?corSymm says that a general correlation matrix is
being estimated (i.e. the correlation between each pair of observations
is being estimated - for this to be meaningful across groups you need
the jth price in one area to be somehow equivalent to the jth price in
another area, which it probably isn't) - I can't figure out how this can
be done if the groups are different sizes.
Even if your groups sizes were all the same, I guess you have lots of
data per neighbourhood, so there will be an aweful lot of correlation
parameters to estimate, and I doubt that it will be successful. Might it
make more sense to start with something less parameter rich like
corCompSymm (which would also be ok with different group sizes, I think)?
Finally I would just set data$neighborhood <- factor(data$neighborhood)
for this. You need this, e.g. to be sure that s(neighborhood,bs="re")
is really doing what you want (i.e. giving a random coefficient for each
neighbourhood, rather than a single random coefficient multiplying
"neighborhood" interpreted as numeric). However if neighborhood is in
as
a factor, then s(neighborhood,bs="re") is adding nothing (you've
effectively already included neighborhood as a random effect with
infinite variance in the model, so including it again won't do anything
interesting).
best,
Simon
On 11/07/13 15:46, Kathrine Veie wrote:> Dear Help list,
>
> I am relatively new to the mgcv package, which I am using to model prices
of housing transactions as a function of the characteristics of a home and a
neighborhood. I have several smooth terms to capture price evolution over time,
but also to non-parametrically fit the functional form of some characteristics
such as living area, lot size etc. In my model I have neighborhood fixed effects
(i.e. prices in different neighborhoods can have different means), but I would
also like to allow for within neighborhood correlation in my errors. My question
is: What is the best way to do this?
>
> Sample size is approx. 14,000 obs.
>
> My model (without clustered residuals) looks something like (although I
have many more regressors, several of which are factor variables):
> mod.1 <- gam(Price~ s(date.of.sale) + s(livingspace) + s(lotsize) +
factor(neighborhood), data=data, family=Gamma(link=log))
>
> I was thinking that I could either include random effects at the
neighborhood level (s(neighborhood, bs="re")) or I could use a GAMM
with correlated errors within group:
>
> mod.2 <- gamm(Price~ s(date.of.sale) + s(livingspace) + s(lotsize) +
factor(neighborhood), correlation=corSymm(form~1|neighborhood), data=data,
family=Gamma(link=log))
>
> I tried out mod.1 with the random effects and it did provide larger
s.e.'s as I would expect given positive correlation in the residuals. But it
also seemed that the random effects component was not significant if I
understand it correctly: the edf are very close to zero and the significance is
NaN. Perhaps if this is the way to go, I should first demean the data at the
neighborhood level?
>
> As for the gamm approach: I can't get it to work properly: It does not
recognize my groups (i.e. it defines only one group). I tried to correct for
this by transforming the neighborhood numbers into characters
> neighborhood.c <- (as.character)
> and then used this as the group indicator instead:
corSymm(form~1|neighborhood.c)
>
> But this resulted in an error message: variable lengths differ (found for
'neighborhood.c')...The same happens when I write
"factor(neighborhood)" in the corSymm specification. My panel is not
balanced, i.e. the number of observations within neighborhoods varies. Is this
a problem? I haven't seen any indication that the panel must be balanced to
use lme, but maybe I've missed it?
>
> Any feedback would be much appreciated incl. suggestions on where I might
read more about how to use mgcv for this type of problem.
>
> Thanks in advance!
> Kathrine
>
> [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]