Dear R community,
I am using 6 variables to test for an effect (by linear regression).
These 6 variables are strongly correlated among each other and I would like
to find out the number of independent test that I perform in this
calcuation. For this I calculated a matrix of correlation coefficients
between the variables (see below). But to find the rank of the table in R is
not the right approach... What else could I do to find the effective number
of independent tests?
Any suggestion would be very welcome!
Thanking you and with my best regards, Georg.
> for (a in 1:6){
+ for (b in 1:6){
+
r[a,b]<-summary(lm(unlist(d[a])~unlist(d[b])),na.action="na.exclude")$adj.r.squared
+ }
+ }>
> r
SR SU ST DR DU DT
SR 1.0000000 0.9636642 0.9554952 0.2975892 0.3211303 0.3314694
SU 0.9636642 1.0000000 0.9101678 0.3324979 0.3331389 0.3323826
ST 0.9554952 0.9101678 1.0000000 0.2756876 0.3031676 0.3501157
DR 0.2975892 0.3324979 0.2756876 1.0000000 0.9981733 0.9674843
DU 0.3211303 0.3331389 0.3031676 0.9981733 1.0000000 0.9977780
DT 0.3314694 0.3323826 0.3501157 0.9674843 0.9977780 1.0000000
*************************
Georg Ehret
Johns Hopkins University
Baltimore, US
[[alternative HTML version deleted]]
It looks like SR, SU and ST are strongly correlated to each other, as well as DR, DU and DT. You can try to do PCA on your 6 variables, pick the first 2 principal components as your new variables and use them for regression. --- On Fri, 11/7/08, Georg Ehret <georgehret at gmail.com> wrote:> From: Georg Ehret <georgehret at gmail.com> > Subject: [R] number of effective tests > To: "r-help" <r-help at stat.math.ethz.ch> > Received: Friday, 11 July, 2008, 11:46 AM > Dear R community, > I am using 6 variables to test for an effect (by > linear regression). > These 6 variables are strongly correlated among each other > and I would like > to find out the number of independent test that I perform > in this > calcuation. For this I calculated a matrix of correlation > coefficients > between the variables (see below). But to find the rank of > the table in R is > not the right approach... What else could I do to find the > effective number > of independent tests? > Any suggestion would be very welcome! > Thanking you and with my best regards, Georg. > > > for (a in 1:6){ > + for (b in 1:6){ > + > r[a,b]<-summary(lm(unlist(d[a])~unlist(d[b])),na.action="na.exclude")$adj.r.squared > + } > + } > > > > r > SR SU ST DR DU > DT > SR 1.0000000 0.9636642 0.9554952 0.2975892 0.3211303 > 0.3314694 > SU 0.9636642 1.0000000 0.9101678 0.3324979 0.3331389 > 0.3323826 > ST 0.9554952 0.9101678 1.0000000 0.2756876 0.3031676 > 0.3501157 > DR 0.2975892 0.3324979 0.2756876 1.0000000 0.9981733 > 0.9674843 > DU 0.3211303 0.3331389 0.3031676 0.9981733 1.0000000 > 0.9977780 > DT 0.3314694 0.3323826 0.3501157 0.9674843 0.9977780 > 1.0000000 > > ************************* > Georg Ehret > Johns Hopkins University > Baltimore, US > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code.
Hi, what do you mean by effective number of tests? How you approach it also depends on the research tradition in your field. Some fields just include the variables in alternative regressions and then include them jointly. However, since your variables are so highly correlated (i.e. they convey almost the same information), you almost certainly have to reduce the dimensionality of your data if you want to include them "jointly" (basically you make 2 out of your 6 variables or whatever number). PCA, as Moshe suggested, is a good way. It is typically used when your variables are measured without error (that is if each of them are hard-fact numbers). If the variables are measured with error (e.g. subject responses on a survey), you would typically perform factor analysis. You may want to standardize each of the six variables before performing pca or factor analysis so that each of the six has the same scale. Otherwise the variables with the greater variance will be much more influential than the others (that's not the best description for it, but I hope its makes the point). look for prcomp() or princomp for PCA and at factanal() for factor analysis (there are packages available for factor analysis too, I think). Best, Daniel Georg Ehret wrote:> > Dear R community, > I am using 6 variables to test for an effect (by linear > regression). > These 6 variables are strongly correlated among each other and I would > like > to find out the number of independent test that I perform in this > calcuation. For this I calculated a matrix of correlation coefficients > between the variables (see below). But to find the rank of the table in R > is > not the right approach... What else could I do to find the effective > number > of independent tests? > Any suggestion would be very welcome! > Thanking you and with my best regards, Georg. > >> for (a in 1:6){ > + for (b in 1:6){ > + > r[a,b]<-summary(lm(unlist(d[a])~unlist(d[b])),na.action="na.exclude")$adj.r.squared > + } > + } >> >> r > SR SU ST DR DU DT > SR 1.0000000 0.9636642 0.9554952 0.2975892 0.3211303 0.3314694 > SU 0.9636642 1.0000000 0.9101678 0.3324979 0.3331389 0.3323826 > ST 0.9554952 0.9101678 1.0000000 0.2756876 0.3031676 0.3501157 > DR 0.2975892 0.3324979 0.2756876 1.0000000 0.9981733 0.9674843 > DU 0.3211303 0.3331389 0.3031676 0.9981733 1.0000000 0.9977780 > DT 0.3314694 0.3323826 0.3501157 0.9674843 0.9977780 1.0000000 > > ************************* > Georg Ehret > Johns Hopkins University > Baltimore, US > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/number-of-effective-tests-tp18395271p18395867.html Sent from the R help mailing list archive at Nabble.com.
On Thu, 10 Jul 2008, Georg Ehret wrote:> Dear R community, > I am using 6 variables to test for an effect (by linear regression). > These 6 variables are strongly correlated among each other and I would like > to find out the number of independent test that I perform in this > calcuation.For what purpose? If you are trying to perform a multiple comparisons adjustment, you might do better to skip this bit and go on to a resampling or permutational procedure. There is an enormous literature on this subject. One example: @book{West:Youn:1993, author = {Westfall, Peter H. and Young, S. Stanley}, title = {Resampling-based multiple testing: {E}xamples and methods for $p$-value adjustment}, year = {1993}, pages = {340}, ISBN = {0471557617}, publisher = {John Wiley \& Sons}, keywords = {Simultaneous inference; Bootstrap} } HTH, Chuck> For this I calculated a matrix of correlation coefficients > between the variables (see below). But to find the rank of the table in R is > not the right approach... What else could I do to find the effective number > of independent tests? > Any suggestion would be very welcome! > Thanking you and with my best regards, Georg. > >> for (a in 1:6){ > + for (b in 1:6){ > + > r[a,b]<-summary(lm(unlist(d[a])~unlist(d[b])),na.action="na.exclude")$adj.r.squared > + } > + } >> >> r > SR SU ST DR DU DT > SR 1.0000000 0.9636642 0.9554952 0.2975892 0.3211303 0.3314694 > SU 0.9636642 1.0000000 0.9101678 0.3324979 0.3331389 0.3323826 > ST 0.9554952 0.9101678 1.0000000 0.2756876 0.3031676 0.3501157 > DR 0.2975892 0.3324979 0.2756876 1.0000000 0.9981733 0.9674843 > DU 0.3211303 0.3331389 0.3031676 0.9981733 1.0000000 0.9977780 > DT 0.3314694 0.3323826 0.3501157 0.9674843 0.9977780 1.0000000 > > ************************* > Georg Ehret > Johns Hopkins University > Baltimore, US > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901