thr3ads.net - R help - [R] RE: Comparison of correlation coefficients

If this information is useful, please help other people find it:
Share via:

Christian.Stratowa@vie.boehringer-ingelheim.com

2004-Jul-21 11:19 UTC

[R] RE: Comparison of correlation coefficients - Details

Dear all

I apologize for cross-posting, but first it is accepted custom to 
thank the repliers and give a summary, and second I have still 
the feeling that this problem might be a general statistical problem 
and not necessarily related to microarrays only, but I might be wrong.

First, I want to thank Robert Gentleman, Mark Kimpel and Mark Reiners
for their kind replies. Robert Gentleman kindly pointed me to the
Bioconductor package "MeasurementError.cor" as alternative to
"cor.test".
Mark Kimpel suggested that 2-way factorial Anova or the Bioconductor
package "limma", respectively, may be helpful. Mark Reiners suggested
to use the p-value of "cor.test" to test the significance.

Maybe, I miss the point, but being not a statistician I am still unsure
if it is possible to compare correlation coefficients from different 
sample sets. Both, the p-values from "cor.test" and from
"compcorr",
could be used as measure of the significance. 
However, is it possible to "normalize" correlation coefficients from
different sample sets? Could an expression such as "corr * (1 - pval)"
be used for normalization? Maybe, it is not possible to normalize
correlation coefficients?
Would a barplot comparing the correlation coefficients between two
genes for different tissues be meaningful? (Alternatively, I have 
tried to use (1-pval) to calculate the gray-level of the bars.)

Any further suggestions would be appreciated very much.

Best regards
Christian Stratowa

-----Original Message-----
From: Stratowa,Dr.,Christian FEX BIG-AT-V 
Sent: Monday, July 19, 2004 15:00
To: 'bioconductor at stat.math.ethz.ch'
Subject: Comparison of correlation coefficients - Details


Dear all

Maybe, my last mail did not explain my problem correctly:
Since we are interested, which genes have similar expression profiles in a
certain tissue or in different tissues, we have calculated the 
correlation coefficients between all 46,000 x 46,000 genes of the 
HG_U133A/B chipset for about 70 tissues, where the number of samples 
per tissue ranges from 10 to more than 200.

While writing an R-function to display the correlation coefficients between
gene A and B in the different tissues as bar-graph, I realized that it may
not be correct to compare the different correlation coefficients directly,
since the number of samples per tissue varyies between 10 and 200.

Thus, the question is: Is there a way to compare different correlation
coefficients and/or apply some kind of normalization?

Assuming that this might be a well known statistical problem I was browsing
statistics books and the web for more information, but could only find the
function "compcorr" which gives a p-value how well you can trust the 
comparison of two correlation coefficients from different samples.

Even though this might currently not be a direct Bioconductor question, it
is certainly a microarray analysis related question. Any suggestions how to
solve this problem would be greatly appreciated.

Best regards
Christian Stratowa


-----Original Message-----
From: Stratowa,Dr.,Christian FEX BIG-AT-V 
Sent: Tuesday, July 13, 2004 14:40
To: 'bioconductor at stat.math.ethz.ch'
Subject: Comparison of correlation coefficients


Dear Bioconductor expeRts

Is it possible to compare correlation coefficients or to normalize 
different correlation coefficients?

Concretely, we have the following situation:
We have gene expression profiles for different tissues, where the 
number of samples per tissue are different, ranging from 10 to 250. 
We are able to determine the correlation between two genes A and B 
for each tissue separately, using "cor.test". However, the question 
arises if the correlation coefficients between different tissues can 
be compared or if they must somehow be "normalized", since the 
number of samples per tissue varyies. 

Searching the web I found the function "compcorr", see:
http://www.fon.hum.uva.nl/Service/Statistics/Two_Correlations.html
http://ftp.sas.com/techsup/download/stat/compcorr.html
and implemented it in R:

compcorr <- function(n1, r1, n2, r2){
# compare two correlation coefficients
# return difference and p-value as list(diff, pval)

#	Fisher Z-transform
	zf1 <- 0.5*log((1 + r1)/(1 - r1))
	zf2 <- 0.5*log((1 + r2)/(1 - r2))

#	difference
	dz <- (zf1 - zf2)/sqrt(1/(n1 - 3) + (1/(n2 - 3)))

#	p-value
	pv <- 2*(1 - pnorm(abs(dz)))

	return(list(diff=dz, pval=pv))
}

Would it make sense to use the resultant p-value to "normalize" the 
correlation coefficients, using: corr <- corr * compcorr()$pval

Is there a better way or an alternative to "normalize" the correlation
coefficients obtained for different tissues?

Thank you in advance for your help.
Since in the company I am not subscribed to bioconductor-help, could you 
please reply to me (in addition to bioconductor-help)

P.S.: I have posted this first at r-help and it was suggested to me to 
post it here, too.

Best regards
Christian Stratowa

=============================================Christian Stratowa, PhD
Boehringer Ingelheim Austria
Dept NCE Lead Discovery - Bioinformatics
Dr. Boehringergasse 5-11
A-1121 Vienna, Austria
Tel.: ++43-1-80105-2470
Fax: ++43-1-80105-2782
email: christian.stratowa at vie.boehringer-ingelheim.com

idimakos@upatras.gr

2004-Jul-21 16:07 UTC

head link

[R] RE: Comparison of correlation coefficients - Details

That sounds very close to a meta-analytic comparison of two statistics. 
As a matter of fact, the Rosenthal & Rubin approach transforms all primary
statistics into Pearson r and then to Fisher's z and then follows with
comparisons.  More, comparisons can take into account sample sizes, or the
value of some other predictor variable.

I believe there is a Rosenthal book on meta-analysis published by Sage
publications, as well as a Brian Mullen book published by Lawrence
Erlbaum.
Brian Mullen's book comes (or used to come) with a meta.exe program to
perform meta-analyses.

Hope this helps,

Ioannis
> Dear all
>
> I apologize for cross-posting, but first it is accepted custom to
> thank the repliers and give a summary, and second I have still
> the feeling that this problem might be a general statistical problem
> and not necessarily related to microarrays only, but I might be wrong.
>
> First, I want to thank Robert Gentleman, Mark Kimpel and Mark Reiners
> for their kind replies. Robert Gentleman kindly pointed me to the
> Bioconductor package "MeasurementError.cor" as alternative to
"cor.test".
> Mark Kimpel suggested that 2-way factorial Anova or the Bioconductor
> package "limma", respectively, may be helpful. Mark Reiners
suggested
> to use the p-value of "cor.test" to test the significance.
>
> Maybe, I miss the point, but being not a statistician I am still unsure
> if it is possible to compare correlation coefficients from different
> sample sets. Both, the p-values from "cor.test" and from
"compcorr",
> could be used as measure of the significance.
> However, is it possible to "normalize" correlation coefficients
from
> different sample sets? Could an expression such as "corr * (1 -
pval)"
> be used for normalization? Maybe, it is not possible to normalize
> correlation coefficients?
> Would a barplot comparing the correlation coefficients between two
> genes for different tissues be meaningful? (Alternatively, I have
> tried to use (1-pval) to calculate the gray-level of the bars.)
>
> Any further suggestions would be appreciated very much.
>
> Best regards
> Christian Stratowa
>
> -----Original Message-----
> From: Stratowa,Dr.,Christian FEX BIG-AT-V
> Sent: Monday, July 19, 2004 15:00
> To: 'bioconductor at stat.math.ethz.ch'
> Subject: Comparison of correlation coefficients - Details
>
>
> Dear all
>
> Maybe, my last mail did not explain my problem correctly:
> Since we are interested, which genes have similar expression profiles in a
> certain tissue or in different tissues, we have calculated the
> correlation coefficients between all 46,000 x 46,000 genes of the
> HG_U133A/B chipset for about 70 tissues, where the number of samples
> per tissue ranges from 10 to more than 200.
>
> While writing an R-function to display the correlation coefficients
> between
> gene A and B in the different tissues as bar-graph, I realized that it may
> not be correct to compare the different correlation coefficients directly,
> since the number of samples per tissue varyies between 10 and 200.
>
> Thus, the question is: Is there a way to compare different correlation
> coefficients and/or apply some kind of normalization?
>
> Assuming that this might be a well known statistical problem I was
> browsing
> statistics books and the web for more information, but could only find the
> function "compcorr" which gives a p-value how well you can trust
the
> comparison of two correlation coefficients from different samples.
>
> Even though this might currently not be a direct Bioconductor question, it
> is certainly a microarray analysis related question. Any suggestions how
> to
> solve this problem would be greatly appreciated.
>
> Best regards
> Christian Stratowa
>
>
> -----Original Message-----
> From: Stratowa,Dr.,Christian FEX BIG-AT-V
> Sent: Tuesday, July 13, 2004 14:40
> To: 'bioconductor at stat.math.ethz.ch'
> Subject: Comparison of correlation coefficients
>
>
> Dear Bioconductor expeRts
>
> Is it possible to compare correlation coefficients or to normalize
> different correlation coefficients?
>
> Concretely, we have the following situation:
> We have gene expression profiles for different tissues, where the
> number of samples per tissue are different, ranging from 10 to 250.
> We are able to determine the correlation between two genes A and B
> for each tissue separately, using "cor.test". However, the
question
> arises if the correlation coefficients between different tissues can
> be compared or if they must somehow be "normalized", since the
> number of samples per tissue varyies.
>
> Searching the web I found the function "compcorr", see:
> http://www.fon.hum.uva.nl/Service/Statistics/Two_Correlations.html
> http://ftp.sas.com/techsup/download/stat/compcorr.html
> and implemented it in R:
>
> compcorr <- function(n1, r1, n2, r2){
> # compare two correlation coefficients
> # return difference and p-value as list(diff, pval)
>
> #	Fisher Z-transform
> 	zf1 <- 0.5*log((1 + r1)/(1 - r1))
> 	zf2 <- 0.5*log((1 + r2)/(1 - r2))
>
> #	difference
> 	dz <- (zf1 - zf2)/sqrt(1/(n1 - 3) + (1/(n2 - 3)))
>
> #	p-value
> 	pv <- 2*(1 - pnorm(abs(dz)))
>
> 	return(list(diff=dz, pval=pv))
> }
>
> Would it make sense to use the resultant p-value to "normalize"
the
> correlation coefficients, using: corr <- corr * compcorr()$pval
>
> Is there a better way or an alternative to "normalize" the
correlation
> coefficients obtained for different tissues?
>
> Thank you in advance for your help.
> Since in the company I am not subscribed to bioconductor-help, could you
> please reply to me (in addition to bioconductor-help)
>
> P.S.: I have posted this first at r-help and it was suggested to me to
> post it here, too.
>
> Best regards
> Christian Stratowa
>
> =============================================> Christian Stratowa, PhD
> Boehringer Ingelheim Austria
> Dept NCE Lead Discovery - Bioinformatics
> Dr. Boehringergasse 5-11
> A-1121 Vienna, Austria
> Tel.: ++43-1-80105-2470
> Fax: ++43-1-80105-2782
> email: christian.stratowa at vie.boehringer-ingelheim.com
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

Unung Istopo Hartanto

2004-Jul-22 07:45 UTC

head link

[R] Population Pyramid Plot

Finally, today I able to produce pyramid plot using any part of script
from R-News Vol 3/2, October 2003 by Paul Murrell "Integrating grid
Graphics output with Base Graphics Output".

Title, i using title function from script in
http://www.r-project.org/misc/acpclust.R .

Maybe, anyone to improve this script. cause, i write it roughly.

Thanks for All,

regards,

Unung
-------------- next part --------------
######################################################
###         Piramida Penduduk                      ###
###         Oleh : Unung Istopo                    ###
###         Enciety Business Consult               ###
###         www.enciety.com                        ###
###         22 Juli 2004                           ###
######################################################

require(gridBase)

# This script from : http://www.r-project.org/misc/acpclust.R
ltitle <-
function(x,backcolor="darkgreen",forecolor="yellow",cex=2,ypos=0.4){
plot(x=c(-1,1),y=c(0,1),xlim=c(0,1),ylim=c(0,1),type="n",axes=FALSE)
polygon(x=c(-1,-1,1,1),y=c(-1,1,1,-1),col=backcolor,border=NA)
text(x=0,y=ypos,pos=4,cex=cex,labels=x,col=forecolor)
}



### Contoh Data
data <-
c(100,110,115,250,350,458,500,567,789,876,600,556,500,465,350,250,145)

data.axes <- c(0,200,400,600,800)

### Kelompok Umur
label <- c("0 - 4", "5 - 9", "10 - 14",
"15 - 19", "20 - 24", "24 - 29",
    "30 - 34", "35 - 39", "40 - 44", "44 -
49", "50 - 54", "55 - 59", "60 - 64",
    "65 - 69", "70 - 74", "75 - 79",
"80+")

#par(mfrow=c(1,2))
#layout(cbind(1,2))

mylayout=layout(matrix(c(1,1,2,3,4,5),ncol=2,byrow=T),widths=c(1/2,1/2),heights=c(lcm(1),lcm(1),1))

#par(mar = c(bottom, left, top, right))

# Judul Atas
par(mar = c(0.1, 0.8, 0.1, 0.1))
ltitle(paste("                                           R Enciety -
Piramida Penduduk"),cex=1.6,ypos=0.4)

# Judul Atas Kiri - Male
par(mar = c(0.1, 0.8, 0.1, 2))
ltitle(paste("                   
Laki-Laki"),cex=1.6,ypos=0.4,backcolor="white",forecolor="darkgreen")

# Judul Atas Kanan - Female
par(mar = c(0.1, 0.8, 0.1, 0.5))
ltitle(paste("                   
Perempuan"),cex=1.6,ypos=0.4,backcolor="white",forecolor="darkgreen")



### Plot Data Bagian Kiri
par(mar = c(5,1.5,1.5,5))
midpts <- barplot(-data,axes=F,horiz=T,col="blue",xlab="(dalam
ribu)")

axis(1,at=-data.axes,labels=FALSE)

# This script from R-News Vol 3/2, October 2003 by Paul Murrell
"Integrating grid Graphics output with Base Graphics Output".

vps <- baseViewports()
par(new=TRUE)
push.viewport(vps$inner,vps$figure,vps$plot)
grid.text(c(as.character(data.axes)),x=unit(-data.axes,"native"),y=unit(-1,"lines"),rot=0)
pop.viewport(3)
par(new=FALSE)



### Plot Data Bagian Kanan
par(mar = c(5,2,1.5,4))
midpts <-
barplot(data,axes=F,horiz=T,col="orange",xlab="(dalam
ribu)")

axis(2,at=midpts,labels=FALSE,col="white")
axis(1,at=data.axes,labels=FALSE)

# This script from R-News Vol 3/2, October 2003 by Paul Murrell
"Integrating grid Graphics output with Base Graphics Output".

vps <- baseViewports()
par(new=TRUE)
push.viewport(vps$inner,vps$figure,vps$plot)
grid.text(c(as.character(data.axes)),x=unit(data.axes,"native"),y=unit(-1,"lines"),rot=0)
pop.viewport(3)

vps <- baseViewports()
par(new=TRUE)
push.viewport(vps$inner,vps$figure,vps$plot)
grid.text(c(as.character(label)),y=unit(midpts,"native"),x=unit(-2,"lines"),rot=0)
pop.viewport(3)

par(new=FALSE)

Maybe Matching Threads

Search for more reasonably related threads

R help - Jul 2004 - RE: Comparison of correlation coefficients - Details

[R] RE: Comparison of correlation coefficients - Details

[R] RE: Comparison of correlation coefficients - Details

[R] Population Pyramid Plot

Maybe Matching Threads