thr3ads.net - R help - [R] strucchange Fstats() example [May 2012]

If this information is useful, please help other people find it:
Share via:

Mabille, Geraldine

2012-May-29 10:05 UTC

[R] strucchange Fstats() example

Dear all,
I'm trying to understand how the strucchange package is working and I have
been looking at the examples given for the Fstats() function.
The first example (Nile), shows one peak in the F-stats and one breakpoint is
estimated, that can be plotted using the following code

## Nile data with one breakpoint: the annual flows drop in 1898
## because the first Ashwan dam was built
data("Nile")
plot(Nile)
## test the null hypothesis that the annual flow remains constant
## over the years
fs.nile <- Fstats(Nile ~ 1)
plot(fs.nile, alpha=0.05)
sctest(fs.nile)
## visualize the breakpoint implied by the argmax of the F statistics
plot(Nile)
lines(breakpoints(fs.nile))


In the second example, the authors state the presence of "at least"
two breakpoints. When plotting the F-statistics using the following code, we see
indeed two peaks in the F-statistics, that coincides with the dates given by the
authors: c.a 1973 and 1983 but when trying to add those breakpoints to the time
series, only one is taken into account


## UK Seatbelt data: a SARIMA(1,0,0)(1,0,0)_12 model
## (fitted by OLS) is used and reveals (at least) two
## breakpoints - one in 1973 associated with the oil crisis and
## one in 1983 due to the introduction of compulsory
## wearing of seatbelts in the UK.
data("UKDriverDeaths")
seatbelt <- log10(UKDriverDeaths)
seatbelt <- cbind(seatbelt, lag(seatbelt, k = -1), lag(seatbelt, k = -12))
colnames(seatbelt) <- c("y", "ylag1", "ylag12")
seatbelt <- window(seatbelt, start = c(1970, 1), end = c(1984,12))
plot(seatbelt[,"y"], ylab = expression(log[10](casualties)))
## compute F statistics for potential breakpoints between
## 1971(6) (corresponds to from = 0.1) and 1983(6) (corresponds to
## to = 0.9 = 1 - from, the default)
## compute F statistics
fs <- Fstats(y ~ ylag1 + ylag12, data = seatbelt, from = 0.1)
## this gives the same result
fs <- Fstats(y ~ ylag1 + ylag12, data = seatbelt, from = c(1971, 6),
to = c(1983, 6))
## plot the F statistics
plot(fs, alpha = 0.05)

###TRYING to plot the 2 breakpoints:
lines(breakpoints(fs,breaks=2))

...only one breakpoint is plotted. When looking at the results from
> breakpoints(fs,breaks=2)
Optimal 2-segment partition:

Call: breakpoints.Fstats(obj = fs, breaks = 2)
Breakpoints at observation number:
46
Corresponding to breakdates:
1973(10)

We see that even though the F-statistics seem to show the existence of 2
breakpoints, only one is detected by the breakpoints() function. Does anyone
know how this is possible? I'm totally new to strucchange so it might well
be something obvious I'm missing here!


OTHER SIDE QUESTION: can strucchange be used if the y variable is binary???

Thanks heaps if anyone can help
Geraldine

From: Mabille, Geraldine
Sent: 29. mai 2012 11:30
To: 'r-help@r-project.org'
Subject: GAM interactions, by example

Dear all,
I'm using the mgcv library by Simon Wood to fit gam models with interactions
and I have been reading (and running) the "factor 'by' variable
example"   given on the gam.models help page (see below, output from the
two first models b, and b1).
The example explains that both b and b1 fits are similar: "note that the
preceding fit (here b) is the same as (b1)...."
I agree with the idea that it "looks" the same but when I look at the
results from both models (summary b and summary b1) I see that the results look
in fact quite different (edf, and also deviance explained for example???)
Are those two models (b and b1) really testing the same things??? If yes, why
are the results so different between models???
Thanks a lot if anyone can help with that...
Geraldine


dat <- gamSim(4)

## fit model...
b <- gam(y ~ fac+s(x2,by=fac)+s(x0),data=dat)
plot(b,pages=1)
summary(b)

Family: gaussian
Link function: identity

Formula:
y ~ fac + s(x2, by = fac) + s(x0)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   1.1784     0.1985   5.937 6.59e-09 ***
fac2         -1.2148     0.2807  -4.329 1.92e-05 ***
fac3          2.2012     0.2436   9.034  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

Approximate significance of smooth terms:
             edf Ref.df      F  p-value
s(x2):fac1 5.364  6.472  2.285   0.0312 *
s(x2):fac2 4.523  5.547 11.396 4.59e-11 ***
s(x2):fac3 8.024  8.741 43.456  < 2e-16 ***
s(x0)      1.000  1.000  0.237   0.6269
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

R-sq.(adj) =  0.634   Deviance explained = 65.3%
GCV score = 4.0288  Scale est. = 3.8082    n = 400

## note that the preceding fit is the same as....
b1<-gam(y ~ s(x2,by=as.numeric(fac==1))+s(x2,by=as.numeric(fac==2))+
            s(x2,by=as.numeric(fac==3))+s(x0)-1,data=dat)
## ... the `-1' is because the intercept is confounded with the
## *uncentred* smooths here.
plot(b1,pages=1)
summary(b1)

Family: gaussian
Link function: identity

Formula:
y ~ s(x2, by = as.numeric(fac == 1)) + s(x2, by = as.numeric(fac =    2)) +
s(x2, by = as.numeric(fac == 3)) + s(x0) - 1

Approximate significance of smooth terms:
                             edf Ref.df       F  p-value
s(x2):as.numeric(fac == 1) 6.341  7.447   6.214 3.38e-07 ***
s(x2):as.numeric(fac == 2) 3.393  3.961  14.727 4.07e-11 ***
s(x2):as.numeric(fac == 3) 9.015  9.737 104.760  < 2e-16 ***
s(x0)                      1.000  1.000   0.266    0.606
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

R-sq.(adj) =  0.631   Deviance explained =   75%
GCV score = 4.0345  Scale est. = 3.8353    n = 400


	[[alternative HTML version deleted]]

Achim Zeileis

2012-May-30 06:22 UTC

head link

[R] strucchange Fstats() example

On Tue, 29 May 2012, Mabille, Geraldine wrote:

<snip>
> In the second example, the authors state the presence of "at
least" two
> breakpoints. When plotting the F-statistics using the following code, we
see
> indeed two peaks in the F-statistics, that coincides with the dates given
by
> the authors: c.a 1973 and 1983 but when trying to add those breakpoints to 
> the time series, only one is taken into account
The breakpoints() method for "Fstats" objects can just extract a
single
breakpoint. The reason is that maximizing the F statistics is equivalent to 
minimizing the residual sum of squares of a model with a single breakpoint. If 
you want to estimate more than a single breakpoint, you need to minimize the 
corresponding segmented sums of squares. This can be done with the formula 
method of breakpoints(), see ?breakpoints.

More specificially: In your example with breakpoints(fs, breaks = 2), the 
breaks argument is simply ignored. The method just does not have a breaks 
argument and it goes through ...
> We see that even though the F-statistics seem to show the existence of 2 
> breakpoints, only one is detected by the breakpoints() function. Does
anyone
> know how this is possible? I'm totally new to strucchange so it might
well be
> something obvious I'm missing here!
Please have a closer look at the package's documentation and the
corresponding
papers. See citation("strucchange") for the most important references
and the
corresponding manual pages for more details. For the breakpoints issue you 
should probably start reading the CSDA paper.
> OTHER SIDE QUESTION: can strucchange be used if the y variable is binary???
Testing for breakpoints can be done with the function gefp(). See its manual 
pages for references and details. The manual page just has a Poisson GLM 
example but the corresponding papers (in Stat Neerl and CSDA) also have binary 
response examples.

If you have a binary response and just want to test whether the proportion of 
successes changes across "time" (or some other variable of interest),
then
maxstat_test() from package "coin" might be an interesting
nonparametric
alternative.

hth,
Z

Reasonably Related Threads

Search for more maybe matching threads

R help - May 2012 - strucchange Fstats() example

[R] strucchange Fstats() example

[R] strucchange Fstats() example

Reasonably Related Threads