thr3ads.net - R help - [R] Optimization problem [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Alan Harrison

2007-Aug-21 16:21 UTC

[R] Optimization problem

Hello Folks,

Very new to R so bear with me, running 5.2 on XP.  Trying to do a zero-inflated
negative binomial regression on placental scar data as dependent.  Lactation,
location, number of tick larvae present and mass of mouse are independents. 
Dataframe and attributes below:


 Location         Lac Scars Lar Mass Lacfac
1   Tullychurry   0     0  15 13.87      0
2      Somerset   0     0   0 15.60      0
3     Tollymore   0     0   3 16.43      0
4     Tollymore   0     0   0 16.55      0
5       Caledon   0     0   0 17.47      0
6  Hillsborough   1     5   0 18.18      1
7       Caledon   0     0   1 19.06      0
8   Portglenone   0     4   0 19.10      0
9   Portglenone   0     5   0 19.13      0
10    Tollymore   0     5   3 19.50      0
11 Hillsborough   1     5   0 19.58      1
12  Portglenone   0     4   0 19.76      0
13      Caledon   0     8   0 19.97      0
14 Hillsborough   1     4   0 20.02      1
15  Tullychurry   0     3   3 20.13      0
16 Hillsborough   1     5   0 20.18      1
17   LoughNavar   1     5   0 20.20      1
18    Tollymore   0     0   1 20.24      0
19 Hillsborough   1     5   0 20.48      1
20      Caledon   0     4   1 20.56      0
21      Caledon   0     3   2 20.58      0
22    Tollymore   0     4   3 20.58      0
23    Tollymore   0     0   2 20.88      0
24 Hillsborough   1     0   0 21.01      1
25  Portglenone   0     5   0 21.08      0
26  Tullychurry   0     2   5 21.28      0
27 Ballysallagh   1     4   0 21.59      1
28      Caledon   0     0   1 21.68      0
29 Hillsborough   1     5   0 22.09      1
30  Tullychurry   0     5   5 22.28      0
31  Tullychurry   1     6  75 22.43      1
32 Ballysallagh   1     5   0 22.57      1
33 Ballysallagh   1     4   0 22.67      1
34   LoughNavar   1     5   3 22.71      1
35 Hillsborough   1     4   0 23.01      1
36      Caledon   0     0   3 23.08      0
37   LoughNavar   1     5   0 23.53      1
38 Ballysallagh   1     4   0 23.55      1
39  Portglenone   1     6   0 23.61      1
40   Mt.Stewart   0     3   0 23.70      0
41     Somerset   0     5   0 23.83      0
42 Ballysallagh   1     5   0 23.93      1
43 Ballysallagh   1     5   0 24.01      1
44      Caledon   0     0   3 24.14      0
45   LoughNavar   0     6   0 24.30      0
46   LoughNavar   1     5   0 24.34      1
47 Hillsborough   1     4   0 24.45      1
48      Caledon   0     3   2 24.55      0
49  Tullychurry   0     5  44 24.83      0
50 Hillsborough   1     5   0 24.86      1
51 Ballysallagh   1     5   0 25.02      1
52  Tullychurry   0     0   9 25.27      0
53   Mt.Stewart   0     5   0 25.31      0
54   LoughNavar   1     4   8 25.43      1
55     Somerset   1     0   0 25.58      1
56 Hillsborough   1     5   0 25.82      1
57  Portglenone   1     2   0 26.02      1
58 Ballysallagh   1     5   0 26.19      1
59   Mt.Stewart   1     0   0 26.66      1
60  Randalstown   1     0   1 26.70      1
61     Somerset   0     4   0 27.01      0
62   Mt.Stewart   0     4   0 27.05      0
63     Somerset   0     3   0 27.10      0
64     Somerset   0     6   0 27.34      0
65     Somerset   0     0   0 27.87      0
66   LoughNavar   1     5   1 28.01      1
67  Tullychurry   1     6  42 28.55      1
68 Hillsborough   1     5   0 28.84      1
69  Portglenone   1     4   0 29.00      1
70     Somerset   1     4   0 31.87      1
71 Ballysallagh   1     5   0 33.06      1
72   LoughNavar   1     4   0 33.24      1
73     Somerset   1     4   0 33.36      1

alan : 'data.frame':    73 obs. of  6 variables:
 $ Location: Factor w/ 10 levels "Ballysallagh",..: 10 8 9 9 2 3 2 6 6
9 ...
 $ Lac     : int  0 0 0 0 0 1 0 0 0 0 ...
 $ Scars   : int  0 0 0 0 0 5 0 4 5 5 ...
 $ Lar     : int  15 0 3 0 0 0 1 0 0 3 ...
 $ Mass    : num  13.9 15.6 16.4 16.6 17.5 ...
 $ Lacfac  : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1
...

The syntax I used to create the model is:

zinb.zc <- zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass +
Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan)

The error given is:

Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE,  : 
        non-finite value supplied by optim
In addition: Warning message:
fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 - pmin(y, 1),
family = binomial())

I understand this is a problem with the model I specified, could anyone help
out??

Many thanks

Alan Harrison

Quercus
Queen's University Belfast
MBC, 97 Lisburn Road
Belfast

BT9 7BL

T: 02890 972219
M: 07798615682


	[[alternative HTML version deleted]]

Gabor Grothendieck

2007-Aug-21 19:38 UTC

head link

[R] Optimization problem

Lac and Lacfac are the same.

On 8/21/07, Alan Harrison <tharrison01 at qub.ac.uk>
wrote:> Hello Folks,
>
> Very new to R so bear with me, running 5.2 on XP.  Trying to do a
zero-inflated negative binomial regression on placental scar data as dependent. 
Lactation, location, number of tick larvae present and mass of mouse are
independents.  Dataframe and attributes below:
>
>
>  Location         Lac Scars Lar Mass Lacfac
> 1   Tullychurry   0     0  15 13.87      0
> 2      Somerset   0     0   0 15.60      0
> 3     Tollymore   0     0   3 16.43      0
> 4     Tollymore   0     0   0 16.55      0
> 5       Caledon   0     0   0 17.47      0
> 6  Hillsborough   1     5   0 18.18      1
> 7       Caledon   0     0   1 19.06      0
> 8   Portglenone   0     4   0 19.10      0
> 9   Portglenone   0     5   0 19.13      0
> 10    Tollymore   0     5   3 19.50      0
> 11 Hillsborough   1     5   0 19.58      1
> 12  Portglenone   0     4   0 19.76      0
> 13      Caledon   0     8   0 19.97      0
> 14 Hillsborough   1     4   0 20.02      1
> 15  Tullychurry   0     3   3 20.13      0
> 16 Hillsborough   1     5   0 20.18      1
> 17   LoughNavar   1     5   0 20.20      1
> 18    Tollymore   0     0   1 20.24      0
> 19 Hillsborough   1     5   0 20.48      1
> 20      Caledon   0     4   1 20.56      0
> 21      Caledon   0     3   2 20.58      0
> 22    Tollymore   0     4   3 20.58      0
> 23    Tollymore   0     0   2 20.88      0
> 24 Hillsborough   1     0   0 21.01      1
> 25  Portglenone   0     5   0 21.08      0
> 26  Tullychurry   0     2   5 21.28      0
> 27 Ballysallagh   1     4   0 21.59      1
> 28      Caledon   0     0   1 21.68      0
> 29 Hillsborough   1     5   0 22.09      1
> 30  Tullychurry   0     5   5 22.28      0
> 31  Tullychurry   1     6  75 22.43      1
> 32 Ballysallagh   1     5   0 22.57      1
> 33 Ballysallagh   1     4   0 22.67      1
> 34   LoughNavar   1     5   3 22.71      1
> 35 Hillsborough   1     4   0 23.01      1
> 36      Caledon   0     0   3 23.08      0
> 37   LoughNavar   1     5   0 23.53      1
> 38 Ballysallagh   1     4   0 23.55      1
> 39  Portglenone   1     6   0 23.61      1
> 40   Mt.Stewart   0     3   0 23.70      0
> 41     Somerset   0     5   0 23.83      0
> 42 Ballysallagh   1     5   0 23.93      1
> 43 Ballysallagh   1     5   0 24.01      1
> 44      Caledon   0     0   3 24.14      0
> 45   LoughNavar   0     6   0 24.30      0
> 46   LoughNavar   1     5   0 24.34      1
> 47 Hillsborough   1     4   0 24.45      1
> 48      Caledon   0     3   2 24.55      0
> 49  Tullychurry   0     5  44 24.83      0
> 50 Hillsborough   1     5   0 24.86      1
> 51 Ballysallagh   1     5   0 25.02      1
> 52  Tullychurry   0     0   9 25.27      0
> 53   Mt.Stewart   0     5   0 25.31      0
> 54   LoughNavar   1     4   8 25.43      1
> 55     Somerset   1     0   0 25.58      1
> 56 Hillsborough   1     5   0 25.82      1
> 57  Portglenone   1     2   0 26.02      1
> 58 Ballysallagh   1     5   0 26.19      1
> 59   Mt.Stewart   1     0   0 26.66      1
> 60  Randalstown   1     0   1 26.70      1
> 61     Somerset   0     4   0 27.01      0
> 62   Mt.Stewart   0     4   0 27.05      0
> 63     Somerset   0     3   0 27.10      0
> 64     Somerset   0     6   0 27.34      0
> 65     Somerset   0     0   0 27.87      0
> 66   LoughNavar   1     5   1 28.01      1
> 67  Tullychurry   1     6  42 28.55      1
> 68 Hillsborough   1     5   0 28.84      1
> 69  Portglenone   1     4   0 29.00      1
> 70     Somerset   1     4   0 31.87      1
> 71 Ballysallagh   1     5   0 33.06      1
> 72   LoughNavar   1     4   0 33.24      1
> 73     Somerset   1     4   0 33.36      1
>
> alan : 'data.frame':    73 obs. of  6 variables:
>  $ Location: Factor w/ 10 levels "Ballysallagh",..: 10 8 9 9 2 3
2 6 6 9 ...
>  $ Lac     : int  0 0 0 0 0 1 0 0 0 0 ...
>  $ Scars   : int  0 0 0 0 0 5 0 4 5 5 ...
>  $ Lar     : int  15 0 3 0 0 0 1 0 0 3 ...
>  $ Mass    : num  13.9 15.6 16.4 16.6 17.5 ...
>  $ Lacfac  : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1
1 1 1 ...
>
> The syntax I used to create the model is:
>
> zinb.zc <- zicounts(resp=Scars~.,x =~Location + Lar + Mass + Lar:Mass +
Location:Mass,z =~Location + Lar + Mass + Lar:Mass + Location:Mass, data=alan)
>
> The error given is:
>
> Error in optim(par = parm, fn = neg.like, gr = neg.grad, hessian = TRUE,  :
>        non-finite value supplied by optim
> In addition: Warning message:
> fitted probabilities numerically 0 or 1 occurred in: glm.fit(zz, 1 -
pmin(y, 1), family = binomial())
>
> I understand this is a problem with the model I specified, could anyone
help out??
>
> Many thanks
>
> Alan Harrison
>
> Quercus
> Queen's University Belfast
> MBC, 97 Lisburn Road
> Belfast
>
> BT9 7BL
>
> T: 02890 972219
> M: 07798615682
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Ben Bolker

2007-Aug-22 01:59 UTC

head link

[R] Optimization problem

(Hope this gets threaded properly.  Sorry if it doesn't.)

   Gabor: Lac and Lacfac being the same is irrelevant, wouldn't
produce NAs (but would produce something like a singular Hessian
and maybe other problems) -- but they're not even specified in this
model.

  The bottom line is that you have a location with a single
observation, so the GLM that zicounts runs to get the initial
parameter values has an unestimable location:mass interaction
for one location, so it gives an NA, so optim complains.

  In gruesome detail:

## set up  data
scardat = read.table("scars.dat",header=TRUE)
library(zicounts)
## try to run model
zinb.zc <- zicounts(resp=Scars~.,
                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    data=scardat)
## tried to debug this by dumping zicounts.R to a file, modifying
## it to put a "trace" argument in that would print out the parameters
## and log-likelihood for every call to the log-likelihood function.
dump("zicounts",file="zicounts.R")
source("zicounts.R")
zinb.zc <- zicounts(resp=Scars~.,
                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    data=scardat,trace=TRUE)
## this actually didn't do any good because the negative log-likelihood
## function never gets called -- as it turns out optim() barfs when it
## gets its initial values, before it ever gets to evaluating the 
log-likelihood

## check the glm -- this is the equivalent of what zicounts does to
## get the initial values of the x parameters
p1 <- glm(Scars~Location + Lar + Mass + Lar:Mass + Location:Mass,
          data=scardat,family="poisson")
which(is.na(coef(p1)))

## find out what the deal is
table(scardat$Location)

scar2 = subset(scardat,Location!="Randalstown")
## first step to removing the bad point from the data set -- but ...
table(scar2$Location)
## it leaves the Location factor with the same levels, so
##  now we have ZERO counts for one location:
## redefine the factor to drop unused levels
scar2$Location <- factor(scar2$Location)
## OK, looks fine now
table(scar2$Location)

zinb.zc <- zicounts(resp=Scars~.,
                    x =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    z =~Location + Lar + Mass + Lar:Mass + Location:Mass,
                    data=scar2)
## now we get another error ("system is computationally singular" when
## trying to compute Hessian -- overparameterized?)   Not in any
## trivial way that I can see.  It would be nice to get into the guts
## of zicounts and stop it from trying to invert the Hessian, which is
## I think where this happens.

  In the meanwhile, I have some other  ideas about this analysis (sorry,
but you started it ...)

  Looking at the data in a few different ways:

library(lattice)
xyplot(Scars~Mass,groups=Location,data=scar2,jitter=TRUE,
       auto.key=list(columns=3))
xyplot(Scars~Mass|Location,data=scar2,jitter=TRUE)

xyplot(Scars~Lar,groups=Location,data=scar2,
       auto.key=list(columns=3))
xyplot(Scars~Mass|Lar,data=scar2)
xyplot(Scars~Lar|Location,data=scar2)

   Some thoughts: (1) I'm not at all sure that
zero-inflation is necessary (see Warton 2005, Environmentrics).
This is a fairly small, noisy data set without huge numbers
of zeros -- a plain old negative binomial might be fine.
 
   I don't actually see a lot of signal here, period (although there may 
be some) ...
there's not a huge range in Lar (whatever it is -- the rest of the 
covariates I
think I can interpret).  It would be tempting to try to fit location as 
a random
effect, because fitting all those extra degrees of freedom is going to 
kill you.
On the other hand, GLMMs are a bit hairy.

   cheers
       Ben

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Aug 2007 - Optimization problem

[R] Optimization problem

[R] Optimization problem

[R] Optimization problem

Seemingly Similar Threads