thr3ads.net - R help - [R] Estimating correlation in multiple measures data [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Michal Figurski

2011-Mar-23 20:17 UTC

[R] Estimating correlation in multiple measures data

Dear R-helpers,

This may sound simple to you, but I'm a beginner in this, so please be 
forgiving.
I have a following problem: two analytes were measured in patient's 
blood on 4 occasions: ProteinA and ProteinB. How to correctly evaluate 
correlation between ProteinA and ProteinB?

I tried:
x <- data.frame(Patient.ID=rep(1:10, each=4), Visit=rep(c(1:4),10), 
ProteinA=rnorm(m=10, n=40), ProteinB=rnorm(m=13,s=0.7,n=40))

gls(ProteinB~ProteinA, data=x, corr=corSymm(form=~Visit|Patient.ID)

In the results I see correlations between the occasions and between 
terms, but not between the proteins. Where to look for it?

-- 
Michal J. Figurski, PhD
HUP, Pathology & Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney S
Philadelphia, PA 19104
tel. (215) 662-3413

Peter Langfelder

2011-Mar-23 20:31 UTC

head link

[R] Estimating correlation in multiple measures data

On Wed, Mar 23, 2011 at 1:17 PM, Michal Figurski
<figurski at mail.med.upenn.edu> wrote:> Dear R-helpers,
>
> This may sound simple to you, but I'm a beginner in this, so please be
> forgiving.
> I have a following problem: two analytes were measured in patient's
blood on
> 4 occasions: ProteinA and ProteinB. How to correctly evaluate correlation
> between ProteinA and ProteinB?
>
> I tried:
> x <- data.frame(Patient.ID=rep(1:10, each=4), Visit=rep(c(1:4),10),
> ProteinA=rnorm(m=10, n=40), ProteinB=rnorm(m=13,s=0.7,n=40))
simply use

cor(x$ProteinA, x$ProteinB)

If you want a p-value and confidence intervals, use

cor.test(x$ProteinA, x$ProteinB)


Peter

Michal Figurski

2011-Mar-24 19:45 UTC

head link

[R] Estimating correlation in multiple measures data

Peter,

Regarding 1) I do not agree. See the following, simplified example:
x <- data.frame(ID=rep(1:2, each=4), Visit=rep(c(1:4), 2), 
ptA=c(7,8,9,10,17,18,19,20), ptB=c(5,6,7,8,21,20,19,18))

In this data frame you have only 2 patients with 4 visits each, but the 
correlation of ptA and ptB is in opposite direction in these 2 patients. 
See the plot:
plot(ptB~ptA, x)

If you do 'cor.test(x$ptA, x$ptB)' you get a very high correlation 
(0.961) and a significant p-value (0.0001356). However, doing it by patient:
xx <- x[x$ID==1,]; cor.test(xx$ptA, xx$ptB)
xx <- x[x$ID==2,]; cor.test(xx$ptA, xx$ptB)
you get 2 opposite correlation values (1 and -1). So in the instance of 
patient 2 the correlation on individual level is _very_ far from the one 
estimated on the whole dataset. My problem is: in what way can I 
estimate the correlation between ptA and ptB taking into account the 
multiple measures?

Regarding 2) This is not as much of a problem. Simplest solution is to 
build a model with and without correlation and compare them with anova. 
P value from anova will indicate significance of the correlation.

Regarding 3) I know of this solution - Bland & Altman paper from BMJ 
1994 recommended that. I'm looking for something more sophisticated...

Best regards,

--
Michal J. Figurski, PhD
HUP, Pathology & Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney S
Philadelphia, PA 19104
tel. (215) 662-3413

On 3/24/2011 1:58 PM, Peter Langfelder wrote:> I see, so it's more of a statistics than R question. A couple thoughts:
>
> 1. The fact that 4 measurements in each single patient are possibly
> highly related should not change the correlation, only the p-value.
> Here's an example: generate two variables a and b
>
> a = c(1:10);
> b = sample(a) + a
>
>> cor(a,b)
>            [,1]
> [1,] 0.4735424
>> cor (rep(a, 4), rep(b, 4))
>            [,1]
> [1,] 0.4735424
>
> Notice that the correlation of a,b, and the correlation of 4-times
> repeated a with 4-times repeated b is exactly the same.
>
> 2. The calculation of a p-value is more complicated and I don't have a
> good answer, but an upper bound on the p-value can be obtained by
> calculating the p-value pretending that there are only 10
> measurements. In the package WGCNA we have a function for that, it's
> called corPvalueStudent.
>
> 3. If the 4 measurements for each patient are very similar, you could
> simply average them, then proceed as if you had 10 independent
> measurements.
>
> Peter
>
> On Thu, Mar 24, 2011 at 10:38 AM, Michal Figurski
> <figurski at mail.med.upenn.edu>  wrote:
>> Peter,
>>
>> This is actually too simple - it doesn't take into account the fact
that the
>> data were measured several times on the same subject. This is one thing
I
>> know for sure, that one should not just lump such data together and
pretend
>> that each point comes from a different patient...
>>
>> --
>> Michal J. Figurski, PhD
>> HUP, Pathology&  Laboratory Medicine
>> Biomarker Research Laboratory
>> 3400 Spruce St. 7 Maloney S
>> Philadelphia, PA 19104
>> tel. (215) 662-3413

Possibly Parallel Threads

Search for more reasonably related threads

R help - Mar 2011 - Estimating correlation in multiple measures data

[R] Estimating correlation in multiple measures data

[R] Estimating correlation in multiple measures data

[R] Estimating correlation in multiple measures data

Possibly Parallel Threads