Nick Negovetich
2014-Mar-14 18:15 UTC
[R] Determining Total Number of Multiple Comparisons
Greetings, I'm running a series of Chi-square tests to examine differences across categorical variables. The situation is this: I have three variables: sex (M/F), habitat (5 levels), season (W,Sp,Su,F). A Cochran-Mantel-Haenzel test detects non-indepedence across my sex strata. I then subsetted my data into males (mat.M) and females (mat.F). Within each sex, I investigated independence between habitat and seasons (ex., chisq.test(mat.M)). This is essentially a multiple comparison test, so I'm correcting my p-value using p.adjust(). My question pertains to 'n' in this function, and how 'n' is calculated as subsets of data are used to tease out the differences in habitat use across seasons. Q1. Am I correct to specify 'n=2' when performing the test of independence for both male and female data? example: p.adjust(chisq.test(mat.M)$p.value,n=2,method='bonferroni') Non-independence was detected for both male and female subsets. Now, I'm interested in seasonal changes in habitat use, which would require additional multiple comparison tests. Thus, I have another question regarding the specification of 'n'. Q2. If I examined the seasonal changes within males using prop.test(), do I add up all multiple comparisons that will be performed (female included), or just the number of tests that will be performed using the male data? The difference is n=5 for male only vs n=10 for both sexes. Here's an example. Habitat types are Forest, Field, Crops, River, Other, and these are the rownames of my matrix (males only) pval <- prop.test(mat.M['Forest',], colSums(mat.M))$p.value p.adjust(pval,n=5,method='bonferroni') Lastly, I have detected differences in habitat use across seasons. I now want to determine which seasons are different within a specific habitat type. Like before, I can pull out the count data and run a series of prop.test() for all 6 comparisons (W vs Sp, W vs Su, W vs F, Sp vs Su, Sp vs F, Su vs F). This leads to my final questions. Q3. Does 'n' in this case refer to only the 6 comparisons within a habitat type within a sex, or will I need to account for ALL tests that will be performed (n=2 sex * 5 habitats * 6 pairwise seasonal comparisons = 60 max)? I will not run pairwise seasonal comparisons for any habitat type that gives a non-significant p-value according to Q2 above. Thanks for the help...