Paul
2009-Sep-09 20:37 UTC
[R] Stats help with calculating between and within subject variance and confidence intervals
Hello. I'm trying to find a way in R to calculate between and within subject variances and confidence intervals for some analytical method development data. I've found a reference to a method in Burdick, R. K. & Graybill, F. A. 1992, Confidence Intervals on variance components, CRC Press. This example is for Balanced Data confidence interval calculation from Pg 62. The data are fill weights from bottles sampled randomly from a sample of four filling machines. There are 12 values, and the confidence intervals are for 1-2a = 95%. I have got the same results as the book but using slightly different fomulae (see variables for H1, G1 and H12 and G12). I'd appreciate any help, and any comments on whether their is a better way to do this. Thanks Paul. > BGBottles Machine weight 1 1 14.23 2 1 14.96 3 1 14.85 4 2 16.46 5 2 16.74 6 2 15.94 7 3 14.98 8 3 14.88 9 3 14.87 10 4 15.94 11 4 16.07 12 4 14.91 > BGaov<-aov(weight~Machine,data=BGBottles) > summary(BGaov) Df Sum Sq Mean Sq F value Pr(>F) Machine 3 5.3294 1.7765 9.7702 0.004733 ** Residuals 8 1.4546 0.1818 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > BGaov Call: aov(formula = weight ~ Machine, data = BGBottles) Terms: Machine Residuals Sum of Squares 5.329425 1.454600 Deg. of Freedom 3 8 Residual standard error: 0.4264094 Estimated effects may be unbalanced > BGaov<-aov(weight~Machine+Error(Machine),data=BGBottles) > summary(BGaov) Error: Machine Df Sum Sq Mean Sq Machine 3 5.3294 1.7765 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Residuals 8 1.45460 0.18182 > BGaov Call: aov(formula = weight ~ Machine + Error(Machine), data = BGBottles) Grand Mean: 15.4025 Stratum 1: Machine Terms: Machine Sum of Squares 5.329425 Deg. of Freedom 3 Estimated effects may be unbalanced Stratum 2: Within Terms: Residuals Sum of Squares 1.4546 Deg. of Freedom 8 Residual standard error: 0.4264094 > confint<-95 > a<-(1-(confint/100))/2 > #str(BGaov) # get structure of BGAOV object > #str(summary(BGaov)) # get structure of BGaov summary > #summary(aov.1.e)[[2]][[1]]$"Df" # Could also use this syntax > grandmean<-as.vector(BGaov$"(Intercept)"[[1]][1]) # Grand Mean (I think) > within<-summary(BGaov)$"Error: Within"[[1]]$"Mean Sq" # S2^2Mean Square Value for Within Machine = 0.1819 > dfMachine<-summary(BGaov)$"Error: Machine"[[1]]$"Df" # DF for within = 3 > dfWithin<-summary(BGaov)$"Error: Within"[[1]]$"Df" # DF for within = 8 > machine<-summary(BGaov)$"Error: Machine"[[1]]$"Mean Sq" # S1^2Mean Square for Machine > between<-(machine-within)/((dfWithin/(dfMachine+1))+1) # (S1^2-S2^2)/J > total<-between+within > between # Between Run Variance [1] 0.53155 > within # Within Run Variance [1] 0.181825 > total # Total Variance [1] 0.713375 > betweenCV<-sqrt(between)/grandmean * 100 # Between Machine CV% > withinCV<-sqrt(within)/grandmean * 100 # Within Machine CV% > totalCV<-sqrt(total)/grandmean * 100 # Total CV% > #within confidence intervals (Chi Squared Method) > withinLCB<-within/qf(1-a,8,Inf) # Within LCB > withinUCB<-within/qf(a,8,Inf) # Within UCB > #Between Confidence Intervals (Modified Large Sample Method) > n1<-dfMachine > n2<-dfWithin > G1<-1-(1/qf(1-a,n1,Inf)) # According to Burdick and Graybill this should be a > G2<-1-(1/qf(1-a,n2,Inf)) > H1<-(1/qf(a,n1,Inf))-1 # and this should be 1-a, but my results don't agree > H2<-(1/qf(a,n2,Inf))-1 > G12<-((qf(1-a,n1,n2)-1)^2-(G1^2*qf(1-a,n1,n2)^2)-(H2^2))/qf(1-a,n1,n2) # again, should be a, not 1-a > H12<-((1-qf(a,n1,n2))^2-H1^2*qf(a,n1,n2)^2-G2^2)/qf(a,n1,n2) # again, should be 1-a, not a > Vu<-H1^2*machine^2+G2^2*within^2+H12*machine*within > Vl<-G1^2*machine^2+H2^2*within^2+G12*within*machine > betweenLCB<-(machine-within-sqrt(Vl))/J # Betwen LCB > betweenUCB<-(machine-within+sqrt(Vu))/J # Between UCB > #Total Confidence Intervals (Graybill-Wang Method) > y<-(machine+(J-1)*within)/J > totalLCB<-y-(sqrt(G1^2*machine^2+G2^2*(J-1)^2*within^2)/J) # Total LCB > totalUCB<-y+(sqrt(H1^2*machine^2+H2^2*(J-1)^2*within^2)/J) # Total UCB > result<-data.frame(Name=c("within", "between", "total"),CV=c(withinCV,betweenCV,totalCV),LCB=c(sqrt(withinLCB)/grandmean*100,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100),UCB=c(sqrt(withinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandmean*100)) > result Name CV LCB UCB 1 within 2.768443 1.869964 5.303702 2 between 4.733483 2.123816 18.551051 3 total 5.483625 3.590745 18.772389
Bert Gunter
2009-Sep-09 21:17 UTC
[R] Stats help with calculating between and within subject variance and confidence intervals
Paul: If these data are real -- or at least a reasonable facsimile -- then even though the machines might be considered "random" -- i.e. a sample from a potential population of machines -- there are too few of them to get a reasonable estimate of their variance. Better to treat them as fixed and just do a standard oneway anova with a single "within" = residual error component. Incidentally, this situation is quite common in gauge R&R = analytical method development studies. While the problem is unavoidable -- you only have so many machines or operators or whatever -- it is unfortunate that standard statistical references do not point out that it's just basically silly to try to estimate variance components with so few df, the resulting confidence intervals, as here, being so wide as to be useless (reflecting the inadequate information). Incidentally, a better way to get at this when there are sufficient df is via the lme() function in the nlme package -- it will work with unbalanced data and not just in the balanced data situation. But there would be a considerable learning curve required, I realize. Cheers, Bert Gunter Genentech Nonclinical Biostatistics -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Paul Sent: Wednesday, September 09, 2009 1:38 PM To: R-help at r-project.org Subject: [R] Stats help with calculating between and within subject variance and confidence intervals Hello. I'm trying to find a way in R to calculate between and within subject variances and confidence intervals for some analytical method development data. I've found a reference to a method in Burdick, R. K. & Graybill, F. A. 1992, Confidence Intervals on variance components, CRC Press. This example is for Balanced Data confidence interval calculation from Pg 62. The data are fill weights from bottles sampled randomly from a sample of four filling machines. There are 12 values, and the confidence intervals are for 1-2a = 95%. I have got the same results as the book but using slightly different fomulae (see variables for H1, G1 and H12 and G12). I'd appreciate any help, and any comments on whether their is a better way to do this. Thanks Paul. > BGBottles Machine weight 1 1 14.23 2 1 14.96 3 1 14.85 4 2 16.46 5 2 16.74 6 2 15.94 7 3 14.98 8 3 14.88 9 3 14.87 10 4 15.94 11 4 16.07 12 4 14.91 > BGaov<-aov(weight~Machine,data=BGBottles) > summary(BGaov) Df Sum Sq Mean Sq F value Pr(>F) Machine 3 5.3294 1.7765 9.7702 0.004733 ** Residuals 8 1.4546 0.1818 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > BGaov Call: aov(formula = weight ~ Machine, data = BGBottles) Terms: Machine Residuals Sum of Squares 5.329425 1.454600 Deg. of Freedom 3 8 Residual standard error: 0.4264094 Estimated effects may be unbalanced > BGaov<-aov(weight~Machine+Error(Machine),data=BGBottles) > summary(BGaov) Error: Machine Df Sum Sq Mean Sq Machine 3 5.3294 1.7765 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Residuals 8 1.45460 0.18182 > BGaov Call: aov(formula = weight ~ Machine + Error(Machine), data = BGBottles) Grand Mean: 15.4025 Stratum 1: Machine Terms: Machine Sum of Squares 5.329425 Deg. of Freedom 3 Estimated effects may be unbalanced Stratum 2: Within Terms: Residuals Sum of Squares 1.4546 Deg. of Freedom 8 Residual standard error: 0.4264094 > confint<-95 > a<-(1-(confint/100))/2 > #str(BGaov) # get structure of BGAOV object > #str(summary(BGaov)) # get structure of BGaov summary > #summary(aov.1.e)[[2]][[1]]$"Df" # Could also use this syntax > grandmean<-as.vector(BGaov$"(Intercept)"[[1]][1]) # Grand Mean (I think) > within<-summary(BGaov)$"Error: Within"[[1]]$"Mean Sq" # S2^2Mean Square Value for Within Machine = 0.1819 > dfMachine<-summary(BGaov)$"Error: Machine"[[1]]$"Df" # DF for within = 3 > dfWithin<-summary(BGaov)$"Error: Within"[[1]]$"Df" # DF for within = 8 > machine<-summary(BGaov)$"Error: Machine"[[1]]$"Mean Sq" # S1^2Mean Square for Machine > between<-(machine-within)/((dfWithin/(dfMachine+1))+1) # (S1^2-S2^2)/J > total<-between+within > between # Between Run Variance [1] 0.53155 > within # Within Run Variance [1] 0.181825 > total # Total Variance [1] 0.713375 > betweenCV<-sqrt(between)/grandmean * 100 # Between Machine CV% > withinCV<-sqrt(within)/grandmean * 100 # Within Machine CV% > totalCV<-sqrt(total)/grandmean * 100 # Total CV% > #within confidence intervals (Chi Squared Method) > withinLCB<-within/qf(1-a,8,Inf) # Within LCB > withinUCB<-within/qf(a,8,Inf) # Within UCB > #Between Confidence Intervals (Modified Large Sample Method) > n1<-dfMachine > n2<-dfWithin > G1<-1-(1/qf(1-a,n1,Inf)) # According to Burdick and Graybill this should be a > G2<-1-(1/qf(1-a,n2,Inf)) > H1<-(1/qf(a,n1,Inf))-1 # and this should be 1-a, but my results don't agree > H2<-(1/qf(a,n2,Inf))-1 > G12<-((qf(1-a,n1,n2)-1)^2-(G1^2*qf(1-a,n1,n2)^2)-(H2^2))/qf(1-a,n1,n2) # again, should be a, not 1-a > H12<-((1-qf(a,n1,n2))^2-H1^2*qf(a,n1,n2)^2-G2^2)/qf(a,n1,n2) # again, should be 1-a, not a > Vu<-H1^2*machine^2+G2^2*within^2+H12*machine*within > Vl<-G1^2*machine^2+H2^2*within^2+G12*within*machine > betweenLCB<-(machine-within-sqrt(Vl))/J # Betwen LCB > betweenUCB<-(machine-within+sqrt(Vu))/J # Between UCB > #Total Confidence Intervals (Graybill-Wang Method) > y<-(machine+(J-1)*within)/J > totalLCB<-y-(sqrt(G1^2*machine^2+G2^2*(J-1)^2*within^2)/J) # Total LCB > totalUCB<-y+(sqrt(H1^2*machine^2+H2^2*(J-1)^2*within^2)/J) # Total UCB > result<-data.frame(Name=c("within", "between", "total"),CV=c(withinCV,betweenCV,totalCV),LCB=c(sqrt(withinLCB)/grandmean*10 0,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100),UCB=c(sqrt(wi thinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandme an*100)) > result Name CV LCB UCB 1 within 2.768443 1.869964 5.303702 2 between 4.733483 2.123816 18.551051 3 total 5.483625 3.590745 18.772389 ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.