Dear R community, I am using 6 variables to test for an effect (by linear regression). These 6 variables are strongly correlated among each other and I would like to find out the number of independent test that I perform in this calcuation. For this I calculated a matrix of correlation coefficients between the variables (see below). But to find the rank of the table in R is not the right approach... What else could I do to find the effective number of independent tests? Any suggestion would be very welcome! Thanking you and with my best regards, Georg.> for (a in 1:6){+ for (b in 1:6){ + r[a,b]<-summary(lm(unlist(d[a])~unlist(d[b])),na.action="na.exclude")$adj.r.squared + } + }> > rSR SU ST DR DU DT SR 1.0000000 0.9636642 0.9554952 0.2975892 0.3211303 0.3314694 SU 0.9636642 1.0000000 0.9101678 0.3324979 0.3331389 0.3323826 ST 0.9554952 0.9101678 1.0000000 0.2756876 0.3031676 0.3501157 DR 0.2975892 0.3324979 0.2756876 1.0000000 0.9981733 0.9674843 DU 0.3211303 0.3331389 0.3031676 0.9981733 1.0000000 0.9977780 DT 0.3314694 0.3323826 0.3501157 0.9674843 0.9977780 1.0000000 ************************* Georg Ehret Johns Hopkins University Baltimore, US [[alternative HTML version deleted]]
It looks like SR, SU and ST are strongly correlated to each other, as well as DR, DU and DT. You can try to do PCA on your 6 variables, pick the first 2 principal components as your new variables and use them for regression. --- On Fri, 11/7/08, Georg Ehret <georgehret at gmail.com> wrote:> From: Georg Ehret <georgehret at gmail.com> > Subject: [R] number of effective tests > To: "r-help" <r-help at stat.math.ethz.ch> > Received: Friday, 11 July, 2008, 11:46 AM > Dear R community, > I am using 6 variables to test for an effect (by > linear regression). > These 6 variables are strongly correlated among each other > and I would like > to find out the number of independent test that I perform > in this > calcuation. For this I calculated a matrix of correlation > coefficients > between the variables (see below). But to find the rank of > the table in R is > not the right approach... What else could I do to find the > effective number > of independent tests? > Any suggestion would be very welcome! > Thanking you and with my best regards, Georg. > > > for (a in 1:6){ > + for (b in 1:6){ > + > r[a,b]<-summary(lm(unlist(d[a])~unlist(d[b])),na.action="na.exclude")$adj.r.squared > + } > + } > > > > r > SR SU ST DR DU > DT > SR 1.0000000 0.9636642 0.9554952 0.2975892 0.3211303 > 0.3314694 > SU 0.9636642 1.0000000 0.9101678 0.3324979 0.3331389 > 0.3323826 > ST 0.9554952 0.9101678 1.0000000 0.2756876 0.3031676 > 0.3501157 > DR 0.2975892 0.3324979 0.2756876 1.0000000 0.9981733 > 0.9674843 > DU 0.3211303 0.3331389 0.3031676 0.9981733 1.0000000 > 0.9977780 > DT 0.3314694 0.3323826 0.3501157 0.9674843 0.9977780 > 1.0000000 > > ************************* > Georg Ehret > Johns Hopkins University > Baltimore, US > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code.
Hi, what do you mean by effective number of tests? How you approach it also depends on the research tradition in your field. Some fields just include the variables in alternative regressions and then include them jointly. However, since your variables are so highly correlated (i.e. they convey almost the same information), you almost certainly have to reduce the dimensionality of your data if you want to include them "jointly" (basically you make 2 out of your 6 variables or whatever number). PCA, as Moshe suggested, is a good way. It is typically used when your variables are measured without error (that is if each of them are hard-fact numbers). If the variables are measured with error (e.g. subject responses on a survey), you would typically perform factor analysis. You may want to standardize each of the six variables before performing pca or factor analysis so that each of the six has the same scale. Otherwise the variables with the greater variance will be much more influential than the others (that's not the best description for it, but I hope its makes the point). look for prcomp() or princomp for PCA and at factanal() for factor analysis (there are packages available for factor analysis too, I think). Best, Daniel Georg Ehret wrote:> > Dear R community, > I am using 6 variables to test for an effect (by linear > regression). > These 6 variables are strongly correlated among each other and I would > like > to find out the number of independent test that I perform in this > calcuation. For this I calculated a matrix of correlation coefficients > between the variables (see below). But to find the rank of the table in R > is > not the right approach... What else could I do to find the effective > number > of independent tests? > Any suggestion would be very welcome! > Thanking you and with my best regards, Georg. > >> for (a in 1:6){ > + for (b in 1:6){ > + > r[a,b]<-summary(lm(unlist(d[a])~unlist(d[b])),na.action="na.exclude")$adj.r.squared > + } > + } >> >> r > SR SU ST DR DU DT > SR 1.0000000 0.9636642 0.9554952 0.2975892 0.3211303 0.3314694 > SU 0.9636642 1.0000000 0.9101678 0.3324979 0.3331389 0.3323826 > ST 0.9554952 0.9101678 1.0000000 0.2756876 0.3031676 0.3501157 > DR 0.2975892 0.3324979 0.2756876 1.0000000 0.9981733 0.9674843 > DU 0.3211303 0.3331389 0.3031676 0.9981733 1.0000000 0.9977780 > DT 0.3314694 0.3323826 0.3501157 0.9674843 0.9977780 1.0000000 > > ************************* > Georg Ehret > Johns Hopkins University > Baltimore, US > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/number-of-effective-tests-tp18395271p18395867.html Sent from the R help mailing list archive at Nabble.com.
On Thu, 10 Jul 2008, Georg Ehret wrote:> Dear R community, > I am using 6 variables to test for an effect (by linear regression). > These 6 variables are strongly correlated among each other and I would like > to find out the number of independent test that I perform in this > calcuation.For what purpose? If you are trying to perform a multiple comparisons adjustment, you might do better to skip this bit and go on to a resampling or permutational procedure. There is an enormous literature on this subject. One example: @book{West:Youn:1993, author = {Westfall, Peter H. and Young, S. Stanley}, title = {Resampling-based multiple testing: {E}xamples and methods for $p$-value adjustment}, year = {1993}, pages = {340}, ISBN = {0471557617}, publisher = {John Wiley \& Sons}, keywords = {Simultaneous inference; Bootstrap} } HTH, Chuck> For this I calculated a matrix of correlation coefficients > between the variables (see below). But to find the rank of the table in R is > not the right approach... What else could I do to find the effective number > of independent tests? > Any suggestion would be very welcome! > Thanking you and with my best regards, Georg. > >> for (a in 1:6){ > + for (b in 1:6){ > + > r[a,b]<-summary(lm(unlist(d[a])~unlist(d[b])),na.action="na.exclude")$adj.r.squared > + } > + } >> >> r > SR SU ST DR DU DT > SR 1.0000000 0.9636642 0.9554952 0.2975892 0.3211303 0.3314694 > SU 0.9636642 1.0000000 0.9101678 0.3324979 0.3331389 0.3323826 > ST 0.9554952 0.9101678 1.0000000 0.2756876 0.3031676 0.3501157 > DR 0.2975892 0.3324979 0.2756876 1.0000000 0.9981733 0.9674843 > DU 0.3211303 0.3331389 0.3031676 0.9981733 1.0000000 0.9977780 > DT 0.3314694 0.3323826 0.3501157 0.9674843 0.9977780 1.0000000 > > ************************* > Georg Ehret > Johns Hopkins University > Baltimore, US > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901