Hello Folks,
Very new to R so bear with me, running 5.2 on XP. Trying to do a zero-inflated
negative binomial regression on placental scar data as dependent. Lactation,
location, number of tick larvae present and mass of mouse are independents.
Dataframe and attributes below:
Location Lac Scars Lar Mass Lacfac
1 Tullychurry 0 0 15 13.87 0
2 Somerset 0 0 0 15.60 0
3 Tollymore 0 0 3 16.43 0
4 Tollymore 0 0 0 16.55 0
5 Caledon 0 0 0 17.47 0
6 Hillsborough 1 5 0 18.18 1
7 Caledon 0 0 1 19.06 0
8 Portglenone 0 4 0 19.10 0
9 Portglenone 0 5 0 19.13 0
10 Tollymore 0 5 3 19.50 0
11 Hillsborough 1 5 0 19.58 1
12 Portglenone 0 4 0 19.76 0
13 Caledon 0 8 0 19.97 0
14 Hillsborough 1 4 0 20.02 1
15 Tullychurry 0 3 3 20.13 0
16 Hillsborough 1 5 0 20.18 1
17 LoughNavar 1 5 0 20.20 1
18 Tollymore 0 0 1 20.24 0
19 Hillsborough 1 5 0 20.48 1
20 Caledon 0 4 1 20.56 0
21 Caledon 0 3 2 20.58 0
22 Tollymore 0 4 3 20.58 0
23 Tollymore 0 0 2 20.88 0
24 Hillsborough 1 0 0 21.01 1
25 Portglenone 0 5 0 21.08 0
26 Tullychurry 0 2 5 21.28 0
27 Ballysallagh 1 4 0 21.59 1
28 Caledon 0 0 1 21.68 0
29 Hillsborough 1 5 0 22.09 1
30 Tullychurry 0 5 5 22.28 0
31 Tullychurry 1 6 75 22.43 1
32 Ballysallagh 1 5 0 22.57 1
33 Ballysallagh 1 4 0 22.67 1
34 LoughNavar 1 5 3 22.71 1
35 Hillsborough 1 4 0 23.01 1
36 Caledon 0 0 3 23.08 0
37 LoughNavar 1 5 0 23.53 1
38 Ballysallagh 1 4 0 23.55 1
39 Portglenone 1 6 0 23.61 1
40 Mt.Stewart 0 3 0 23.70 0
41 Somerset 0 5 0 23.83 0
42 Ballysallagh 1 5 0 23.93 1
43 Ballysallagh 1 5 0 24.01 1
44 Caledon 0 0 3 24.14 0
45 LoughNavar 0 6 0 24.30 0
46 LoughNavar 1 5 0 24.34 1
47 Hillsborough 1 4 0 24.45 1
48 Caledon 0 3 2 24.55 0
49 Tullychurry 0 5 44 24.83 0
50 Hillsborough 1 5 0 24.86 1
51 Ballysallagh 1 5 0 25.02 1
52 Tullychurry 0 0 9 25.27 0
53 Mt.Stewart 0 5 0 25.31 0
54 LoughNavar 1 4 8 25.43 1
55 Somerset 1 0 0 25.58 1
56 Hillsborough 1 5 0 25.82 1
57 Portglenone 1 2 0 26.02 1
58 Ballysallagh 1 5 0 26.19 1
59 Mt.Stewart 1 0 0 26.66 1
60 Randalstown 1 0 1 26.70 1
61 Somerset 0 4 0 27.01 0
62 Mt.Stewart 0 4 0 27.05 0
63 Somerset 0 3 0 27.10 0
64 Somerset 0 6 0 27.34 0
65 Somerset 0 0 0 27.87 0
66 LoughNavar 1 5 1 28.01 1
67 Tullychurry 1 6 42 28.55 1
68 Hillsborough 1 5 0 28.84 1
69 Portglenone 1 4 0 29.00 1
70 Somerset 1 4 0 31.87 1
71 Ballysallagh 1 5 0 33.06 1
72 LoughNavar 1 4 0 33.24 1
73 Somerset 1 4 0 33.36 1
alan : 'data.frame': 73 obs. of 6 variables:
$ Location: Factor w/ 10 levels "Ballysallagh",..: 10 8 9 9 2 3 2 6 6
9 ...
$ Lac : int 0 0 0 0 0 1 0 0 0 0 ...
$ Scars : int 0 0 0 0 0 5 0 4 5 5 ...
$ Lar : int 15 0 3 0 0 0 1 0 0 3 ...
$ Mass : num 13.9 15.6 16.4 16.6 17.5 ...
$ Lacfac : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1
...
The syntax I used to create the model is:
zinb.zc <- zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass +
Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan)
The error given is:
Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE, :
non-finite value supplied by optim
In addition: Warning message:
fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 - pmin(y, 1),
family = binomial())
I understand this is a problem with the model I specified, could anyone help
out??
Many thanks
Alan Harrison
Quercus
Queen's University Belfast
MBC, 97 Lisburn Road
Belfast
BT9 7BL
T: 02890 972219
M: 07798615682
[[alternative HTML version deleted]]
Lac and Lacfac are the same. On 8/21/07, Alan Harrison <tharrison01 at qub.ac.uk> wrote:> Hello Folks, > > Very new to R so bear with me, running 5.2 on XP. Trying to do a zero-inflated negative binomial regression on placental scar data as dependent. Lactation, location, number of tick larvae present and mass of mouse are independents. Dataframe and attributes below: > > > Location Lac Scars Lar Mass Lacfac > 1 Tullychurry 0 0 15 13.87 0 > 2 Somerset 0 0 0 15.60 0 > 3 Tollymore 0 0 3 16.43 0 > 4 Tollymore 0 0 0 16.55 0 > 5 Caledon 0 0 0 17.47 0 > 6 Hillsborough 1 5 0 18.18 1 > 7 Caledon 0 0 1 19.06 0 > 8 Portglenone 0 4 0 19.10 0 > 9 Portglenone 0 5 0 19.13 0 > 10 Tollymore 0 5 3 19.50 0 > 11 Hillsborough 1 5 0 19.58 1 > 12 Portglenone 0 4 0 19.76 0 > 13 Caledon 0 8 0 19.97 0 > 14 Hillsborough 1 4 0 20.02 1 > 15 Tullychurry 0 3 3 20.13 0 > 16 Hillsborough 1 5 0 20.18 1 > 17 LoughNavar 1 5 0 20.20 1 > 18 Tollymore 0 0 1 20.24 0 > 19 Hillsborough 1 5 0 20.48 1 > 20 Caledon 0 4 1 20.56 0 > 21 Caledon 0 3 2 20.58 0 > 22 Tollymore 0 4 3 20.58 0 > 23 Tollymore 0 0 2 20.88 0 > 24 Hillsborough 1 0 0 21.01 1 > 25 Portglenone 0 5 0 21.08 0 > 26 Tullychurry 0 2 5 21.28 0 > 27 Ballysallagh 1 4 0 21.59 1 > 28 Caledon 0 0 1 21.68 0 > 29 Hillsborough 1 5 0 22.09 1 > 30 Tullychurry 0 5 5 22.28 0 > 31 Tullychurry 1 6 75 22.43 1 > 32 Ballysallagh 1 5 0 22.57 1 > 33 Ballysallagh 1 4 0 22.67 1 > 34 LoughNavar 1 5 3 22.71 1 > 35 Hillsborough 1 4 0 23.01 1 > 36 Caledon 0 0 3 23.08 0 > 37 LoughNavar 1 5 0 23.53 1 > 38 Ballysallagh 1 4 0 23.55 1 > 39 Portglenone 1 6 0 23.61 1 > 40 Mt.Stewart 0 3 0 23.70 0 > 41 Somerset 0 5 0 23.83 0 > 42 Ballysallagh 1 5 0 23.93 1 > 43 Ballysallagh 1 5 0 24.01 1 > 44 Caledon 0 0 3 24.14 0 > 45 LoughNavar 0 6 0 24.30 0 > 46 LoughNavar 1 5 0 24.34 1 > 47 Hillsborough 1 4 0 24.45 1 > 48 Caledon 0 3 2 24.55 0 > 49 Tullychurry 0 5 44 24.83 0 > 50 Hillsborough 1 5 0 24.86 1 > 51 Ballysallagh 1 5 0 25.02 1 > 52 Tullychurry 0 0 9 25.27 0 > 53 Mt.Stewart 0 5 0 25.31 0 > 54 LoughNavar 1 4 8 25.43 1 > 55 Somerset 1 0 0 25.58 1 > 56 Hillsborough 1 5 0 25.82 1 > 57 Portglenone 1 2 0 26.02 1 > 58 Ballysallagh 1 5 0 26.19 1 > 59 Mt.Stewart 1 0 0 26.66 1 > 60 Randalstown 1 0 1 26.70 1 > 61 Somerset 0 4 0 27.01 0 > 62 Mt.Stewart 0 4 0 27.05 0 > 63 Somerset 0 3 0 27.10 0 > 64 Somerset 0 6 0 27.34 0 > 65 Somerset 0 0 0 27.87 0 > 66 LoughNavar 1 5 1 28.01 1 > 67 Tullychurry 1 6 42 28.55 1 > 68 Hillsborough 1 5 0 28.84 1 > 69 Portglenone 1 4 0 29.00 1 > 70 Somerset 1 4 0 31.87 1 > 71 Ballysallagh 1 5 0 33.06 1 > 72 LoughNavar 1 4 0 33.24 1 > 73 Somerset 1 4 0 33.36 1 > > alan : 'data.frame': 73 obs. of 6 variables: > $ Location: Factor w/ 10 levels "Ballysallagh",..: 10 8 9 9 2 3 2 6 6 9 ... > $ Lac : int 0 0 0 0 0 1 0 0 0 0 ... > $ Scars : int 0 0 0 0 0 5 0 4 5 5 ... > $ Lar : int 15 0 3 0 0 0 1 0 0 3 ... > $ Mass : num 13.9 15.6 16.4 16.6 17.5 ... > $ Lacfac : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ... > > The syntax I used to create the model is: > > zinb.zc <- zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass + Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan) > > The error given is: > > Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE, : > non-finite value supplied by optim > In addition: Warning message: > fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 - pmin(y, 1), family = binomial()) > > I understand this is a problem with the model I specified, could anyone help out?? > > Many thanks > > Alan Harrison > > Quercus > Queen's University Belfast > MBC, 97 Lisburn Road > Belfast > > BT9 7BL > > T: 02890 972219 > M: 07798615682 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
(Hope this gets threaded properly. Sorry if it doesn't.)
Gabor: Lac and Lacfac being the same is irrelevant, wouldn't
produce NAs (but would produce something like a singular Hessian
and maybe other problems) -- but they're not even specified in this
model.
The bottom line is that you have a location with a single
observation, so the GLM that zicounts runs to get the initial
parameter values has an unestimable location:mass interaction
for one location, so it gives an NA, so optim complains.
In gruesome detail:
## set up data
scardat = read.table("scars.dat",header=TRUE)
library(zicounts)
## try to run model
zinb.zc <- zicounts(resp=Scars~.,
x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data=scardat)
## tried to debug this by dumping zicounts.R to a file, modifying
## it to put a "trace" argument in that would print out the parameters
## and log-likelihood for every call to the log-likelihood function.
dump("zicounts",file="zicounts.R")
source("zicounts.R")
zinb.zc <- zicounts(resp=Scars~.,
x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data=scardat,trace=TRUE)
## this actually didn't do any good because the negative log-likelihood
## function never gets called -- as it turns out optim() barfs when it
## gets its initial values, before it ever gets to evaluating the
log-likelihood
## check the glm -- this is the equivalent of what zicounts does to
## get the initial values of the x parameters
p1 <- glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass,
data=scardat,family="poisson")
which(is.na(coef(p1)))
## find out what the deal is
table(scardat$Location)
scar2 = subset(scardat,Location!="Randalstown")
## first step to removing the bad point from the data set -- but ...
table(scar2$Location)
## it leaves the Location factor with the same levels, so
## now we have ZERO counts for one location:
## redefine the factor to drop unused levels
scar2$Location <- factor(scar2$Location)
## OK, looks fine now
table(scar2$Location)
zinb.zc <- zicounts(resp=Scars~.,
x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
data=scar2)
## now we get another error ("system is computationally singular" when
## trying to compute Hessian -- overparameterized?) Not in any
## trivial way that I can see. It would be nice to get into the guts
## of zicounts and stop it from trying to invert the Hessian, which is
## I think where this happens.
In the meanwhile, I have some other ideas about this analysis (sorry,
but you started it ...)
Looking at the data in a few different ways:
library(lattice)
xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE,
auto.key=list(columns=3))
xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE)
xyplot(Scars~Lar,groups=Location,data=scar2,
auto.key=list(columns=3))
xyplot(Scars~Mass|Lar,data=scar2)
xyplot(Scars~Lar|Location,data=scar2)
Some thoughts: (1) I'm not at all sure that
zero-inflation is necessary (see Warton 2005, Environmentrics).
This is a fairly small, noisy data set without huge numbers
of zeros -- a plain old negative binomial might be fine.
I don't actually see a lot of signal here, period (although there may
be some) ...
there's not a huge range in Lar (whatever it is -- the rest of the
covariates I
think I can interpret). It would be tempting to try to fit location as
a random
effect, because fitting all those extra degrees of freedom is going to
kill you.
On the other hand, GLMMs are a bit hairy.
cheers
Ben