Tal Galili
2009-Jul-24 17:23 UTC
[R] How to test frequency independence (in a 2 by 2 table) with many missing values
Hello dear R help group. My question is statistical and not R specific, yet I hope some of you might be willing to help. *Experiment settings*: We have a list of subjects. each of them went through two tests with the answer to each can be either 0 or 1. *Goal*: We want to know if the two experiments yielded different results in the subjects answers. *Statistical test (and why it won't work)*: Naturally we would turn to performing a mcnemar test. But here is the twist: we have missing values in our data. For our purpose, let's assume the missingnes is completely at random, and we also have no data to use for imputation. Also, we have much more missing data for experiment number 2 then in experiment number 1. So the question is, under these settings, how do we test for experiment effect on the outcome? So far I have thought of two options: 1) To perform the test only for subjects that has both values. But they are too scarce and will yield low power. 2) To treat the data as independent and do a pearson's chi square test (well, an exact fisher test that is) on all the non-missing data that we have. The problem with this is that our data is not fully independent (which is a prerequisite to chi test, if I understood it correctly). So I am not sure if that is a valid procedure or not. Any insights will be warmly welcomed. p.s: here is an example code producing this scenario. set.seed(102) x1 <- rbinom(100, 1, .5) x2 <- rbinom(100, 1, .3) X <- data.frame(x1,x2) tX <- table(X) margin.table(tX,1) margin.table(tX,2) mcnemar.test(tX) put.missings <- function(x.vector, na.percent) { turn.na <- rbinom(length(x.vector), 1, na.percent) x.vector[turn.na == 1] <- NA return(x.vector) } x1.m <- put.missings(x1, .3) x2.m <- put.missings(x2, .6) tX.m <- rbind(table(na.omit(x1.m)), table(na.omit(x2.m))) fisher.test(tX.m) With regards, Tal -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]]
Robert A LaBudde
2009-Jul-24 18:16 UTC
[R] How to test frequency independence (in a 2 by 2 table) with many missing values
1. Drop all subjects where both test results are not present. These are uninformative on the difference. 2. Perform McNemar test on the remaining table. (Only the different pairs are informative.) This will give you a p-value on the data that actually contrasts the two tests. (For your example, the p-value is > X2<- data.frame(x1.m, x2.m) > tX2<- table(X2) > tX2 x2.m x1.m 0 1 0 6 5 1 15 4 > mcnemar.test(tX2) McNemar's Chi-squared test with continuity correction data: tX2 McNemar's chi-squared = 4.05, df = 1, p-value = 0.04417 The low power is a consequence of the data not being informative. You'll have to live with it. If you want to squeeze any more out this data, I'd guess about the only way to do it would be via a maximum likelihood approach with marginals for the one test only data, an assumption of, e.g., a binomial distribution for each test, and then use a likelihood ratio test for p1=p2 vs. p1!=p2. If you really feel up to it, you could program a randomization or bootstrap test equivalent to the ML approach, maintaining the marginal totals involved. At 01:23 PM 7/24/2009, Tal Galili wrote:>Hello dear R help group. > >My question is statistical and not R specific, yet I hope some of you might >be willing to help. > >*Experiment settings*: We have a list of subjects. each of them went >through two tests with the answer to each can be either 0 or 1. >*Goal*: We want to know if the two experiments yielded different results in >the subjects answers. >*Statistical test (and why it won't work)*: Naturally we would turn to >performing a mcnemar test. But here is the twist: we have missing values in >our data. >For our purpose, let's assume the missingnes is completely at random, and we >also have no data to use for imputation. Also, we have much more missing >data for experiment number 2 then in experiment number 1. > >So the question is, under these settings, how do we test for experiment >effect on the outcome? > >So far I have thought of two options: >1) To perform the test only for subjects that has both values. But they are >too scarce and will yield low power. >2) To treat the data as independent and do a pearson's chi square test >(well, an exact fisher test that is) on all the non-missing data that we >have. The problem with this is that our data is not fully independent (which >is a prerequisite to chi test, if I understood it correctly). So I am not >sure if that is a valid procedure or not. > >Any insights will be warmly welcomed. > > >p.s: here is an example code producing this scenario. > >set.seed(102) > >x1 <- rbinom(100, 1, .5) >x2 <- rbinom(100, 1, .3) > >X <- data.frame(x1,x2) >tX <- table(X) >margin.table(tX,1) >margin.table(tX,2) >mcnemar.test(tX) > >put.missings <- function(x.vector, na.percent) >{ >turn.na <- rbinom(length(x.vector), 1, na.percent) > x.vector[turn.na == 1] <- NA >return(x.vector) >} > > >x1.m <- put.missings(x1, .3) >x2.m <- put.missings(x2, .6) > >tX.m <- rbind(table(na.omit(x1.m)), table(na.omit(x2.m))) >fisher.test(tX.m) > > > > >With regards, >Tal > > > > > > > > > >-- >---------------------------------------------- > > >My contact information: >Tal Galili >Phone number: 972-50-3373767 >FaceBook: Tal Galili >My Blogs: >http://www.r-statistics.com/ >http://www.talgalili.com >http://www.biostatistics.co.il > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.===============================================================Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"
Daniel Malter
2009-Jul-24 19:17 UTC
[R] How to test frequency independence (in a 2 by 2 table) withmany missing values
Hi Tal, you can use the lme4 library and use a random effects model. I will walk you through the example below (though, generally you should ask a statistician at your school about this). At the very end, I will include a loop for Monte-Carlo simulations that shows that the estimation of the fixed effects is unbiased, which assures you that you are estimating the "correct" coefficient. In addition, as Robert says, you should remove all subjects for which both observations are missing. ##First, we simulate some data. ##Note how the data structure matches the data structure you have: id=rep(1:100,each=2) #the subject ID test=rep(0:1,100) # first or second measurement/test (your variable of interest) randeff=rep(rnorm(100,0,1),each=2) # a random effect for each subject linear.predictor=randeff+2*test #the linear predictor #note the coefficient on the fixed effect of "test" is 2 probablty=exp(linear.predictor)/(1+exp(linear.predictor)) #compute the probability using the linear predictor y=rbinom(200,1,probablty) #draw 0/1 dependent variables #with probability according to the variable probablty miss=sample(1:200,20,replace=FALSE) #some indices for missing variables y[miss]=NA #replace the dependent variable with NAs for the "miss"ing indices ##Some additional data preparation id=factor(id) #make sure id is coded as a factor/dummy mydata=data.frame(y,test,id) #bind the data you need for estimation in a dataframe ##Run the analysis library(lme4) reg=lmer(y~test+(1|id),binomial,data=mydata) summary(reg) ##And here are the Monte-Carlo simulations library(lme4) bootstraps=200 estim=matrix(0,nrow=bootstraps,ncol=2) for(i in 1:bootstraps){ id=rep(1:100,each=2) test=rep(0:1,100) randeff=rep(rnorm(100,0,1),each=2) linear.predictor=randeff+2*test probablty=exp(linear.predictor)/(1+exp(linear.predictor)) y=rbinom(200,1,probablty) miss=sample(1:200,20,replace=FALSE) y[miss]=NA id=factor(id) mydata=data.frame(y,test,id) reg=lmer(y~test+(1|id),binomial,data=mydata) estim[i,]=reg at fixef } mean(estim[,1]) #mean estimate of the intercept from 200 MC-simulations #should be close to zero (no intercept in the linear.predictor) mean(estim[,2]) #mean estimate of the 'test' fixed effect from 200 MC-simulations #should be close to 2 (as in the linear.predictor) #plot estimated coefficients from 200 MC simulations par(mfcol=c(1,2)) hist(estim[,1],main="Intercept") hist(estim[,2],main="Effect of variable 'test'") HTH, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Tal Galili Gesendet: Friday, July 24, 2009 1:23 PM An: r-help at r-project.org Betreff: [R] How to test frequency independence (in a 2 by 2 table) withmany missing values Hello dear R help group. My question is statistical and not R specific, yet I hope some of you might be willing to help. *Experiment settings*: We have a list of subjects. each of them went through two tests with the answer to each can be either 0 or 1. *Goal*: We want to know if the two experiments yielded different results in the subjects answers. *Statistical test (and why it won't work)*: Naturally we would turn to performing a mcnemar test. But here is the twist: we have missing values in our data. For our purpose, let's assume the missingnes is completely at random, and we also have no data to use for imputation. Also, we have much more missing data for experiment number 2 then in experiment number 1. So the question is, under these settings, how do we test for experiment effect on the outcome? So far I have thought of two options: 1) To perform the test only for subjects that has both values. But they are too scarce and will yield low power. 2) To treat the data as independent and do a pearson's chi square test (well, an exact fisher test that is) on all the non-missing data that we have. The problem with this is that our data is not fully independent (which is a prerequisite to chi test, if I understood it correctly). So I am not sure if that is a valid procedure or not. Any insights will be warmly welcomed. p.s: here is an example code producing this scenario. set.seed(102) x1 <- rbinom(100, 1, .5) x2 <- rbinom(100, 1, .3) X <- data.frame(x1,x2) tX <- table(X) margin.table(tX,1) margin.table(tX,2) mcnemar.test(tX) put.missings <- function(x.vector, na.percent) { turn.na <- rbinom(length(x.vector), 1, na.percent) x.vector[turn.na == 1] <- NA return(x.vector) } x1.m <- put.missings(x1, .3) x2.m <- put.missings(x2, .6) tX.m <- rbind(table(na.omit(x1.m)), table(na.omit(x2.m))) fisher.test(tX.m) With regards, Tal -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.