Following up on Duncan's answer...
You can see that the model doesn't fit with a little bit of residual
checking. For example
gam.check(Model)
shows that the binomial assumption is just wrong here: there is massive
overdispersion, for example. Even the crudest response to this, of using
quasibinomial, then gives a non-significant s(x1) in the examples I've
tried.
best,
Simon
On 10/10/15 12:40, Duncan Murdoch wrote:> On 09/10/2015 8:00 PM, Erin Conlisk wrote:
>> Hello,
>>
>> I am having trouble testing for the significance using a binomial model
in
>> gam{mgcv}. Have I stumbled on a bug? I doubt I would be so lucky, so
>> could someone tell me what I am doing wrong?
>>
>> Please see the following code:
>> ________________________________
>>
>> # PROBLEM USING cbind
>>
>> x1 <- runif(500, 0, 100) # Create 500 random variables to use as my
>> explanatory variable
>>
>> y1 <- floor(runif(500, 0, 100)) # Create 500 random counts to serve
as
>> binomial "successes"
>>
>> y2 <- 100-y1 # Create 500 binomial "failures", assuming a
total of 100
>> trials and the successes recorded in y1
>>
>> Model <- gam(cbind(y1, y2) ? 1 + s(x1), family=binomial)
>> summary(Model)
>> ________________________________
>>
>> The result is that my random variable, x1, is highly significant. This
>> can't be right...
> The problem is that statistical significance of a test doesn't mean
that
> the alternative you have in mind is right, it just means that the null
> is wrong (or you were unlucky, but let's ignore that).
>
> The null hypothesis here is that all of the y1 values are independent
> binomial values with a common n=100 and common unspecifed probability of
> success.
>
> Since in fact y1 comes from a discrete uniform distribution, that's
> false, and the p-value is not anywhere near being a random Unif(0,1) value.
>
> If you wanted the null hypothesis to be true, you'd need to choose a
> success probability p somehow, then set y1 <- rbinom(500, size=100, prob
> = p). The random uniform has a far larger variance, and that leads to a
> larger deviance in gam, hence significance.
>
> Duncan Murdoch
>
>> So what happens when I change the observations from a "batch"
of 100 trials
>> to individual successes and failures?
>> ________________________________
>>
>> # NOW MAKE ALL THESE DATA 0 and 1
>>
>> r01<-rep(0,500)
>> data01<-cbind(x1, y1, y2, r01)
>> rownames(data01)<-seq(1,500, 1)
>> colnames(data01)<-c('x1', 'y1', 'y2',
'r01')
>> data01<-data.frame(data01)
>>
>> expanded0 <- data01[rep(row.names(data01), data01$y1), 1:4] #
Creates a
>> replicate of the # explanatory variables for each individual
"success"
>>
>> r01<-rep(1,500)
>> data01<-cbind(x1, y1, y2, r01)
>> rownames(data01)<-seq(1,500, 1)
>> colnames(data01)<-c('x1', 'y1', 'y2',
'r01')
>> data01<-data.frame(data01)
>>
>> expanded1 <- data01[rep(row.names(data01), data01$y2), 1:4] #
Creates a
>> replicate of the # explanatory variables for each individual
"failure"
>>
>> data01<-rbind(expanded0,expanded1)
>>
>> Model2 <- gam(r01 ? 1 + s(x1), family=binomial)
>> summary(Model2)
>> ___________________________________
>>
>> The result is what I expect. Now my random variable, x1, is NOT
>> significant.
>>
>> What is going on here?
>>
>> I should say that I didn't just make this up. My question arose
playing
>> with my real data, where I was getting high significance, but a
terrible
>> proportion of deviance explained.
>>
>> My apologies if this is explained elsewhere, but I have spent hours
>> searching for an answer online.
>>
>> Thank you kindly,
>> Erin Conlisk
>>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603 http://people.bath.ac.uk/sw283