Michal Figurski
2011-Mar-23 20:17 UTC
[R] Estimating correlation in multiple measures data
Dear R-helpers, This may sound simple to you, but I'm a beginner in this, so please be forgiving. I have a following problem: two analytes were measured in patient's blood on 4 occasions: ProteinA and ProteinB. How to correctly evaluate correlation between ProteinA and ProteinB? I tried: x <- data.frame(Patient.ID=rep(1:10, each=4), Visit=rep(c(1:4),10), ProteinA=rnorm(m=10, n=40), ProteinB=rnorm(m=13,s=0.7,n=40)) gls(ProteinB~ProteinA, data=x, corr=corSymm(form=~Visit|Patient.ID) In the results I see correlations between the occasions and between terms, but not between the proteins. Where to look for it? -- Michal J. Figurski, PhD HUP, Pathology & Laboratory Medicine Biomarker Research Laboratory 3400 Spruce St. 7 Maloney S Philadelphia, PA 19104 tel. (215) 662-3413
Peter Langfelder
2011-Mar-23 20:31 UTC
[R] Estimating correlation in multiple measures data
On Wed, Mar 23, 2011 at 1:17 PM, Michal Figurski <figurski at mail.med.upenn.edu> wrote:> Dear R-helpers, > > This may sound simple to you, but I'm a beginner in this, so please be > forgiving. > I have a following problem: two analytes were measured in patient's blood on > 4 occasions: ProteinA and ProteinB. How to correctly evaluate correlation > between ProteinA and ProteinB? > > I tried: > x <- data.frame(Patient.ID=rep(1:10, each=4), Visit=rep(c(1:4),10), > ProteinA=rnorm(m=10, n=40), ProteinB=rnorm(m=13,s=0.7,n=40))simply use cor(x$ProteinA, x$ProteinB) If you want a p-value and confidence intervals, use cor.test(x$ProteinA, x$ProteinB) Peter
Michal Figurski
2011-Mar-24 19:45 UTC
[R] Estimating correlation in multiple measures data
Peter, Regarding 1) I do not agree. See the following, simplified example: x <- data.frame(ID=rep(1:2, each=4), Visit=rep(c(1:4), 2), ptA=c(7,8,9,10,17,18,19,20), ptB=c(5,6,7,8,21,20,19,18)) In this data frame you have only 2 patients with 4 visits each, but the correlation of ptA and ptB is in opposite direction in these 2 patients. See the plot: plot(ptB~ptA, x) If you do 'cor.test(x$ptA, x$ptB)' you get a very high correlation (0.961) and a significant p-value (0.0001356). However, doing it by patient: xx <- x[x$ID==1,]; cor.test(xx$ptA, xx$ptB) xx <- x[x$ID==2,]; cor.test(xx$ptA, xx$ptB) you get 2 opposite correlation values (1 and -1). So in the instance of patient 2 the correlation on individual level is _very_ far from the one estimated on the whole dataset. My problem is: in what way can I estimate the correlation between ptA and ptB taking into account the multiple measures? Regarding 2) This is not as much of a problem. Simplest solution is to build a model with and without correlation and compare them with anova. P value from anova will indicate significance of the correlation. Regarding 3) I know of this solution - Bland & Altman paper from BMJ 1994 recommended that. I'm looking for something more sophisticated... Best regards, -- Michal J. Figurski, PhD HUP, Pathology & Laboratory Medicine Biomarker Research Laboratory 3400 Spruce St. 7 Maloney S Philadelphia, PA 19104 tel. (215) 662-3413 On 3/24/2011 1:58 PM, Peter Langfelder wrote:> I see, so it's more of a statistics than R question. A couple thoughts: > > 1. The fact that 4 measurements in each single patient are possibly > highly related should not change the correlation, only the p-value. > Here's an example: generate two variables a and b > > a = c(1:10); > b = sample(a) + a > >> cor(a,b) > [,1] > [1,] 0.4735424 >> cor (rep(a, 4), rep(b, 4)) > [,1] > [1,] 0.4735424 > > Notice that the correlation of a,b, and the correlation of 4-times > repeated a with 4-times repeated b is exactly the same. > > 2. The calculation of a p-value is more complicated and I don't have a > good answer, but an upper bound on the p-value can be obtained by > calculating the p-value pretending that there are only 10 > measurements. In the package WGCNA we have a function for that, it's > called corPvalueStudent. > > 3. If the 4 measurements for each patient are very similar, you could > simply average them, then proceed as if you had 10 independent > measurements. > > Peter > > On Thu, Mar 24, 2011 at 10:38 AM, Michal Figurski > <figurski at mail.med.upenn.edu> wrote: >> Peter, >> >> This is actually too simple - it doesn't take into account the fact that the >> data were measured several times on the same subject. This is one thing I >> know for sure, that one should not just lump such data together and pretend >> that each point comes from a different patient... >> >> -- >> Michal J. Figurski, PhD >> HUP, Pathology& Laboratory Medicine >> Biomarker Research Laboratory >> 3400 Spruce St. 7 Maloney S >> Philadelphia, PA 19104 >> tel. (215) 662-3413