I want to investigate possible relationships between two discrete variables. I have tried a few things but figured you guys might be able to point me at some purpose built functions. Our scientists score results of tests which are performed in lets say, 8 positions. The scores are assigned a value of 1,2,3 or 4. I want to know if there is a correlation between the test results and the position. The scientists have a feeling that position 1 does not score as high as the others. Not all 8 positions are always used, so the frequency of all test results can be substantially biased towards the first position. Here is an example dataset (not very biased) resulting from table(result, position): 1 2 3 4 5 6 7 8 0 3 3 2 2 0 3 3 0 1 11 4 6 7 7 3 3 5 2 38 37 32 38 31 21 23 27 3 51 66 54 66 57 37 58 56 4 3 1 3 0 1 0 1 1 Because the test results are highly quantized, the boxplots I tried all looked pretty much the same. The bias means that stacked barplots aren't that useful for visualising the data. With a bit of data processing I guess I could normalise the total frequencies of each test position. I also tried a correlation between the two variables. The answer is non-zero but I am not sure that any relationship between the two variables would be monotonic (BTW cor() give me the correlation coefficient, how do I get the "confidence" of the coefficient?) Maybe I am overlooking the obvious, like just averaging the scores. cheers
Paul - This situation seems like an obvious candidate for a log-linear model. See the book MASS for details. They're beyond the scope of this list. Or try help.search("log-linear"). (and ... can you find a way to break lines when sending your email ?) - tom blackwell - u michigan medical school - ann arbor - On Tue, 11 Nov 2003, Paul Sorenson wrote:> I want to investigate possible relationships between two discrete variables. I have tried a few things but figured you guys might be able to point me at some purpose built functions. > > Our scientists score results of tests which are performed in lets say, 8 positions. The scores are assigned a value of 1,2,3 or 4. I want to know if there is a correlation between the test results and the position. The scientists have a feeling that position 1 does not score as high as the others. > > Not all 8 positions are always used, so the frequency of all test results can be substantially biased towards the first position. Here is an example dataset (not very biased) resulting from table(result, position): > > 1 2 3 4 5 6 7 8 > 0 3 3 2 2 0 3 3 0 > 1 11 4 6 7 7 3 3 5 > 2 38 37 32 38 31 21 23 27 > 3 51 66 54 66 57 37 58 56 > 4 3 1 3 0 1 0 1 1 > > Because the test results are highly quantized, the boxplots I tried all looked pretty much the same. > > The bias means that stacked barplots aren't that useful for visualising the data. With a bit of data processing I guess I could normalise the total frequencies of each test position. > > I also tried a correlation between the two variables. The answer is non-zero but I am not sure that any relationship between the two variables would be monotonic (BTW cor() give me the correlation coefficient, how do I get the "confidence" of the coefficient?) > > Maybe I am overlooking the obvious, like just averaging the scores. > > cheers > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
Don't use correlation for discrete variables; the correlation coefficient does not vary freely between -1 and 1, it is tightly constrained by the joint probabilities of nonzero outcomes in a rather counterintuitive way. Please ask me off list if you don't follow this. While I'm off-topic, following Tom B's comment, the only way I know to break lines in emails using Explorer is to actually type returns - is what I always do here and is a pain in the ****. If anyone knows how to overcome this I'd be very grateful. [I don't have a choice about using Explorer] Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}
Further to my queries re relating discrete variables I have had a couple of tips on things I could try. This has lead me to attempt a "marginal homogeneity" test (http://ourworld.compuserve.com/homepages/jsuebersax/margin.htm). o Does anyone have an opinion on whether this approach would be appropriate? o Does R have some built in help to do this? I found a reference to the McNemar test but not to the Stuart-Maxwell test. cheers [[alternative HTML version deleted]]