thr3ads.net - R help - [R] Difference in p-value obtained in an interrupted time-series analysis (containing main effects and and interaction) vs. a simple regression containing only main effects. [Jun 2025]

If this information is useful, please help other people find it:
Share via:

Sorkin, John

2025-Jun-30 03:05 UTC

[R] Difference in p-value obtained in an interrupted time-series analysis (containing main effects and and interaction) vs. a simple regression containing only main effects.

The question that follows in NOT an R question, but rather a statistics
question. I hope you will forgive my statistics question.

I am investigating interrupted time-series analysis. My data has two periods,
period 0 and period 1. In period 0 the slope is positive. In period 2 the slope
is negative:

mydata<- structure(list(period = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                   0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1),
                        time2 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L,
                                  13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L,
                                  11L, 12L, 13L, 14L, 15L, 16L), time = 1:32, 
                        y = c(0.0190981041979942, 
                              -0.648843854645016, 2.62941433593151,
2.97383527060948, 2.68476280937483,
                              2.51909810419799, 1.85115614535498,
5.12941433593151, 5.47383527060948,
                              5.18476280937483, 5.01909810419799,
4.35115614535498, 7.62941433593151,
                              7.97383527060948, 7.68476280937483,
7.51909810419799, 6.35115614535498,
                              8.12941433593151, 6.97383527060948,
5.18476280937483, 3.51909810419799,
                              1.35115614535498, 3.12941433593151,
1.97383527060948, 0.184762809374828,
                              -1.48090189580201, -3.64884385464502,
-1.87058566406849,
                              -3.02616472939052, -4.81523719062517,
-6.48090189580201,
                              -8.64884385464502)), row.names = c(NA, -32L),
class = "data.frame")
mydata
plot(mydata$time,mydata$y)
title("Data Used in my analyses")

# Regression of y on time, period 0 (16-data points)
fit0 <- lm(y~time,data=mydata[1:16,])
summary(fit0) #(16-data points)
abline(fit0,col="green")
cat("Regression period 0:",
"beta=",summary(fit0)$coefficients["time","Estimate"],"p-value=",summary(fit0)$coefficients["time","Pr(>|t|)"],"\n")

# Regression of y on time, period 1 (16-data points)
fit1 <- lm(y~time,data=mydata[17:32,]) #(16-data points)
abline(fit1,col="red")
summary(fit1)
cat("Regression period 1:",
"beta=",summary(fit1)$coefficients["time","Estimate"],"p-value=",summary(fit1)$coefficients["time","Pr(>|t|)"],"\n")

# Regression of y on time, period 0 and 1 (32-data points)
fit2 <- lm(y~time+period+time*period,data=mydata[1:32,])
summary(fit2) #(32-data points)
cat("Regression period 1 and 2 using interaction:",
"beta=",summary(fit2)$coefficients["time","Estimate"],"p-value=",summary(fit2)$coefficients["time","Pr(>|t|)"],"\n")


Please note that the regression of y on time in period 0 (fit 0) returns
time slope=0.52358 p=2.86e-07

Please note that the regression of y on time in period 0 (fit 2) and 1 returns
time slope=0.52358 p=1.50e-09

The time slope are the same in the two models, fit0 and fit2, however the
p-values are different, 2.86e-07 (fit 0) vs. 1.50e-09 (fit 2) a two-orders of
magnitude improvement in the p-value!
I am not surprised that the time slopes are the same.  I am shocked that the
p-values are different. While fit 0 used 16 lines of data, and fit 2 used 32
lines of data, the number of values that were used to compute the time slopes in
period 0 by the two models are the same, 16. This being the case, why are the
p-values of the time slopes different in fit0 vs. fit2?

Thank you,
John
 
P.S. This question is NOT homework. I am many years beyond being a student (at
least a student in a class), but I am a teacher of a class (and a life-long
student). The question comes from a discussion I had with a student in one of my
classes.





John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;?
PI?Biostatistics and Informatics Core, University of Maryland School of Medicine
Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382

John Fox

2025-Jun-30 03:43 UTC

head link

[R] Difference in p-value obtained in an interrupted time-series analysis (containing main effects and and interaction) vs. a simple regression containing only main effects.

Dear John,

Without investigating your data and models in detail, it's not 
surprising that the p-values for the two coefficients differ: the models 
have different residual standard errors (and hence different coefficient 
standard errors) and different residual degrees of freedom (and hence 
are based on different t-distributions).

I hope this helps,
  John
-- 
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/
--
On 2025-06-29 11:05 p.m., Sorkin, John wrote:> Caution: External email.
> 
> 
> The question that follows in NOT an R question, but rather a statistics
question. I hope you will forgive my statistics question.
> 
> I am investigating interrupted time-series analysis. My data has two
periods, period 0 and period 1. In period 0 the slope is positive. In period 2
the slope is negative:
> 
> mydata<- structure(list(period = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>                                     0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1),
>                          time2 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L,
>                                    13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L,
>                                    11L, 12L, 13L, 14L, 15L, 16L), time =
1:32,
>                          y = c(0.0190981041979942,
>                                -0.648843854645016, 2.62941433593151,
2.97383527060948, 2.68476280937483,
>                                2.51909810419799, 1.85115614535498,
5.12941433593151, 5.47383527060948,
>                                5.18476280937483, 5.01909810419799,
4.35115614535498, 7.62941433593151,
>                                7.97383527060948, 7.68476280937483,
7.51909810419799, 6.35115614535498,
>                                8.12941433593151, 6.97383527060948,
5.18476280937483, 3.51909810419799,
>                                1.35115614535498, 3.12941433593151,
1.97383527060948, 0.184762809374828,
>                                -1.48090189580201, -3.64884385464502,
-1.87058566406849,
>                                -3.02616472939052, -4.81523719062517,
-6.48090189580201,
>                                -8.64884385464502)), row.names = c(NA,
-32L), class = "data.frame")
> mydata
> plot(mydata$time,mydata$y)
> title("Data Used in my analyses")
> 
> # Regression of y on time, period 0 (16-data points)
> fit0 <- lm(y~time,data=mydata[1:16,])
> summary(fit0) #(16-data points)
> abline(fit0,col="green")
> cat("Regression period 0:",
"beta=",summary(fit0)$coefficients["time","Estimate"],"p-value=",summary(fit0)$coefficients["time","Pr(>|t|)"],"\n")
> 
> # Regression of y on time, period 1 (16-data points)
> fit1 <- lm(y~time,data=mydata[17:32,]) #(16-data points)
> abline(fit1,col="red")
> summary(fit1)
> cat("Regression period 1:",
"beta=",summary(fit1)$coefficients["time","Estimate"],"p-value=",summary(fit1)$coefficients["time","Pr(>|t|)"],"\n")
> 
> # Regression of y on time, period 0 and 1 (32-data points)
> fit2 <- lm(y~time+period+time*period,data=mydata[1:32,])
> summary(fit2) #(32-data points)
> cat("Regression period 1 and 2 using interaction:",
"beta=",summary(fit2)$coefficients["time","Estimate"],"p-value=",summary(fit2)$coefficients["time","Pr(>|t|)"],"\n")
> 
> 
> Please note that the regression of y on time in period 0 (fit 0) returns
> time slope=0.52358 p=2.86e-07
> 
> Please note that the regression of y on time in period 0 (fit 2) and 1
returns
> time slope=0.52358 p=1.50e-09
> 
> The time slope are the same in the two models, fit0 and fit2, however the
p-values are different, 2.86e-07 (fit 0) vs. 1.50e-09 (fit 2) a two-orders of
magnitude improvement in the p-value!
> I am not surprised that the time slopes are the same.  I am shocked that
the p-values are different. While fit 0 used 16 lines of data, and fit 2 used 32
lines of data, the number of values that were used to compute the time slopes in
period 0 by the two models are the same, 16. This being the case, why are the
p-values of the time slopes different in fit0 vs. fit2?
> 
> Thank you,
> John
> 
> P.S. This question is NOT homework. I am many years beyond being a student
(at least a student in a class), but I am a teacher of a class (and a life-long
student). The question comes from a discussion I had with a student in one of my
classes.
> 
> 
> 
> 
> 
> John David Sorkin M.D., Ph.D.
> Professor of Medicine, University of Maryland School of Medicine;
> Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;
> PI Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
> Senior Statistician University of Maryland Center for Vascular Research;
> 
> Division of Gerontology and Paliative Care,
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> Cell phone 443-418-5382
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Jun 2025 - Difference in p-value obtained in an interrupted time-series analysis (containing main effects and and interaction) vs. a simple regression containing only main effects.

[R] Difference in p-value obtained in an interrupted time-series analysis (containing main effects and and interaction) vs. a simple regression containing only main effects.

[R] Difference in p-value obtained in an interrupted time-series analysis (containing main effects and and interaction) vs. a simple regression containing only main effects.