I want to investigate possible relationships between two discrete variables. I
have tried a few things but figured you guys might be able to point me at some
purpose built functions.
Our scientists score results of tests which are performed in lets say, 8
positions. The scores are assigned a value of 1,2,3 or 4. I want to know if
there is a correlation between the test results and the position. The
scientists have a feeling that position 1 does not score as high as the others.
Not all 8 positions are always used, so the frequency of all test results can be
substantially biased towards the first position. Here is an example dataset
(not very biased) resulting from table(result, position):
1 2 3 4 5 6 7 8
0 3 3 2 2 0 3 3 0
1 11 4 6 7 7 3 3 5
2 38 37 32 38 31 21 23 27
3 51 66 54 66 57 37 58 56
4 3 1 3 0 1 0 1 1
Because the test results are highly quantized, the boxplots I tried all looked
pretty much the same.
The bias means that stacked barplots aren't that useful for visualising the
data. With a bit of data processing I guess I could normalise the total
frequencies of each test position.
I also tried a correlation between the two variables. The answer is non-zero
but I am not sure that any relationship between the two variables would be
monotonic (BTW cor() give me the correlation coefficient, how do I get the
"confidence" of the coefficient?)
Maybe I am overlooking the obvious, like just averaging the scores.
cheers
Paul -
This situation seems like an obvious candidate for a log-linear model.
See the book MASS for details. They're beyond the scope of this list.
Or try help.search("log-linear").
(and ... can you find a way to break lines when sending your email ?)
- tom blackwell - u michigan medical school - ann arbor -
On Tue, 11 Nov 2003, Paul Sorenson wrote:
> I want to investigate possible relationships between two discrete
variables. I have tried a few things but figured you guys might be able to
point me at some purpose built functions.
>
> Our scientists score results of tests which are performed in lets say, 8
positions. The scores are assigned a value of 1,2,3 or 4. I want to know if
there is a correlation between the test results and the position. The
scientists have a feeling that position 1 does not score as high as the others.
>
> Not all 8 positions are always used, so the frequency of all test results
can be substantially biased towards the first position. Here is an example
dataset (not very biased) resulting from table(result, position):
>
> 1 2 3 4 5 6 7 8
> 0 3 3 2 2 0 3 3 0
> 1 11 4 6 7 7 3 3 5
> 2 38 37 32 38 31 21 23 27
> 3 51 66 54 66 57 37 58 56
> 4 3 1 3 0 1 0 1 1
>
> Because the test results are highly quantized, the boxplots I tried all
looked pretty much the same.
>
> The bias means that stacked barplots aren't that useful for visualising
the data. With a bit of data processing I guess I could normalise the total
frequencies of each test position.
>
> I also tried a correlation between the two variables. The answer is
non-zero but I am not sure that any relationship between the two variables would
be monotonic (BTW cor() give me the correlation coefficient, how do I get the
"confidence" of the coefficient?)
>
> Maybe I am overlooking the obvious, like just averaging the scores.
>
> cheers
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
Don't use correlation for discrete variables; the
correlation coefficient does not vary freely between -1
and 1, it is tightly constrained by the joint probabilities of
nonzero outcomes in a rather counterintuitive way.
Please ask me off list if you don't follow this.
While I'm off-topic, following Tom B's comment, the only
way I know to break lines in emails using Explorer is to
actually type returns - is what I always do here and is a
pain in the ****. If anyone knows how to overcome this
I'd be very grateful. [I don't have a choice about using
Explorer]
Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 644449
Fax: +44 (0) 1379 644445
email: Simon.Fear at synequanon.com
web: http://www.synequanon.com
Number of attachments included with this message: 0
This message (and any associated files) is confidential and\...{{dropped}}
Further to my queries re relating discrete variables I have had a couple of
tips on things I could try. This has lead me to attempt a "marginal
homogeneity" test
(http://ourworld.compuserve.com/homepages/jsuebersax/margin.htm).
o Does anyone have an opinion on whether this approach would be
appropriate?
o Does R have some built in help to do this? I found a reference to
the McNemar test but not to the Stuart-Maxwell test.
cheers
[[alternative HTML version deleted]]