I wan't a test for the 'association' between two events, lets say the color of balls picked and the pickers (this is quite a good analogy to my data). I have 200 different pickers P I have 1,000 colors of balls C I have 1,000,000 picks in total I am totally confused about what test to apply and when and why. This is what I *think* I know how many balls each picker picked - so that marginal is fixed. I know how many balls of each color there are - so that marginal is fixed. I know the total picks. I can test the 'association' between Picker p and color c by doing the following... prob_of_pick(p) = picks made by p / total picks prob_of_color(c) = balls of color c / total picks prob_of_sucess = prob_of_pick_of_color(pc) = picks made by p / total picks * balls of color c / total picks USE BINOMIAL DISTRIBUTION n = total picks k = number of balls of color c picked by picker p p = prob_of_pick_of_color(pc) Significance of this particular observation = if( k < n*p ){ for (x in 0:k){ sig += dbinom(x,n,p) } } else{ for (x in k:n){ sig += dbinom(x,n,p) } } In the case that np and npq > 10, I use the normal approximation to the binomial distribution with mean np and variance np(1-p), and correction for continuity (+-0.5 depending on the direction of the test). Should I use Fishers exact test? What do I do when the numbers are very large? Here is a sample of my data... COLOR PICKER PICKED C_TOTAL P_TOTAL GRAND_TOTAL 46458 rs 2 706 3285 878702 46548 rs 6 725 3285 878702 46557 rs 2 180 3285 878702 46561 rs 1 243 3285 878702 46565 rs 2 1864 3285 878702 46579 rs 1 1263 3285 878702 46589 rs 3 1168 3285 878702 46600 rs 2 301 3285 878702 46604 rs 1 105 3285 878702 46609 rs 1 302 3285 878702 46626 rs 32 1532 3285 878702 ... 89095 ho 1 265 1369 878702 89124 ho 1 176 1369 878702 89360 ho 2 290 1369 878702 89392 ho 1 146 1369 878702 89447 ho 1 114 1369 878702 89550 ho 1 413 1369 878702 89919 ho 1 174 1369 878702 90002 ho 2 183 1369 878702 90096 ho 1 154 1369 878702 90123 ho 4 2130 1369 878702 How can I simply add an extra column to this data that gives me a measure of the significance of 'association' (positive or negative) between Picker and color? I am totally confused! Sorry for the lenght of the email.... Dan.