Kaiyin Zhong
2011-Dec-04 15:36 UTC
[R] Complex multiple t tests in a data frame with several id factors
I have assayed the concentrations of various metal elements in different anatomic regions of two strains of mice. Now, for each element, in each region, I want to do a t test to find whether there is any difference between the two strains. Here is what I did (using simulated data as an example): # create the data frame> elemconc = data.frame(expand.grid(id=1:3, geno=c('exp', 'wt'), region=c('brain', 'spine'), elem=c('fe', 'cu', 'zn')), conc=rnorm(36, 10)) > elemconcid geno region elem conc 1 1 exp brain fe 8.497498 2 2 exp brain fe 9.280944 3 3 exp brain fe 9.726271 4 1 wt brain fe 11.556397 5 2 wt brain fe 10.992550 6 3 wt brain fe 9.711200 7 1 exp spine fe 11.168603 8 2 exp spine fe 9.331127 9 3 exp spine fe 11.048226 10 1 wt spine fe 8.480867 11 2 wt spine fe 8.887062 12 3 wt spine fe 8.329797 13 1 exp brain cu 10.242652 14 2 exp brain cu 9.865984 15 3 exp brain cu 9.163728 16 1 wt brain cu 11.099385 17 2 wt brain cu 9.364261 18 3 wt brain cu 9.718322 19 1 exp spine cu 10.720157 20 2 exp spine cu 11.505430 21 3 exp spine cu 9.499359 22 1 wt spine cu 9.855950 23 2 wt spine cu 10.120489 24 3 wt spine cu 9.526252 25 1 exp brain zn 9.736196 26 2 exp brain zn 11.938710 27 3 exp brain zn 9.668625 28 1 wt brain zn 9.961574 29 2 wt brain zn 10.461621 30 3 wt brain zn 9.873667 31 1 exp spine zn 9.708067 32 2 exp spine zn 10.109309 33 3 exp spine zn 10.973387 34 1 wt spine zn 8.406536 35 2 wt spine zn 7.797746 36 3 wt spine zn 11.127984 # use tapply to aggregate> tapply(elemconc$conc, elemconc[c('elem', 'region')], function(x) x)region elem brain spine fe Numeric,6 Numeric,6 cu Numeric,6 Numeric,6 zn Numeric,6 Numeric,6 # check whether the order of data has been preserved after aggregation> x['fe', 'brain'][[1]] [1] 8.497498 9.280944 9.726271 11.556397 10.992550 9.711200 # create an external factor for strain grouping> tmpgeno = rep(c('exp', 'wt'), each=3) > tmpgeno[1] "exp" "exp" "exp" "wt" "wt" "wt" # do the t test using the grouping factor> x = tapply(elemconc$conc, elemconc[c('elem', 'region')], function(x) t.test(x~tmpgeno) ) > xregion elem brain spine fe List,9 List,9 cu List,9 List,9 zn List,9 List,9 I believe I have made no mistakes so far, but I wonder is there a better way of doing this? -- Kaiyin Zhong ------------------------------------------------------------------------------------------------------------------ Neuroscience Research Institute, Peking University, Beijing, P.R.China 100038
Bert Gunter
2011-Dec-04 16:01 UTC
[R] Complex multiple t tests in a data frame with several id factors
The concentrations of the different metals within an animal are correlated, so that doing as you suggest will almost certainly result in nonsense P values. So I suggest you seek local statistical help or, failing that, post on a statistical forum like stats.stackexchange.com . There are various multivariate packages -- check e.g. the ChemPhys and Multivariate task views -- that may be pertinent, but your post suggests that you probably need some help to use them. Ergo my suggestion above. Cheers, Bert On Sun, Dec 4, 2011 at 7:36 AM, Kaiyin Zhong <kindlychung at gmail.com> wrote:> I have assayed the concentrations of various metal elements in > different anatomic regions of two strains of mice. Now, for each > element, in each region, I want to do a t test to find whether there > is any difference between the two strains. > > Here is what I did (using simulated data as an example): > > # create the data frame >> elemconc = data.frame(expand.grid(id=1:3, geno=c('exp', 'wt'), region=c('brain', 'spine'), elem=c('fe', 'cu', 'zn')), conc=rnorm(36, 10)) >> elemconc > ? id geno region elem ? ? ?conc > 1 ? 1 ?exp ?brain ? fe ?8.497498 > 2 ? 2 ?exp ?brain ? fe ?9.280944 > 3 ? 3 ?exp ?brain ? fe ?9.726271 > 4 ? 1 ? wt ?brain ? fe 11.556397 > 5 ? 2 ? wt ?brain ? fe 10.992550 > 6 ? 3 ? wt ?brain ? fe ?9.711200 > 7 ? 1 ?exp ?spine ? fe 11.168603 > 8 ? 2 ?exp ?spine ? fe ?9.331127 > 9 ? 3 ?exp ?spine ? fe 11.048226 > 10 ?1 ? wt ?spine ? fe ?8.480867 > 11 ?2 ? wt ?spine ? fe ?8.887062 > 12 ?3 ? wt ?spine ? fe ?8.329797 > 13 ?1 ?exp ?brain ? cu 10.242652 > 14 ?2 ?exp ?brain ? cu ?9.865984 > 15 ?3 ?exp ?brain ? cu ?9.163728 > 16 ?1 ? wt ?brain ? cu 11.099385 > 17 ?2 ? wt ?brain ? cu ?9.364261 > 18 ?3 ? wt ?brain ? cu ?9.718322 > 19 ?1 ?exp ?spine ? cu 10.720157 > 20 ?2 ?exp ?spine ? cu 11.505430 > 21 ?3 ?exp ?spine ? cu ?9.499359 > 22 ?1 ? wt ?spine ? cu ?9.855950 > 23 ?2 ? wt ?spine ? cu 10.120489 > 24 ?3 ? wt ?spine ? cu ?9.526252 > 25 ?1 ?exp ?brain ? zn ?9.736196 > 26 ?2 ?exp ?brain ? zn 11.938710 > 27 ?3 ?exp ?brain ? zn ?9.668625 > 28 ?1 ? wt ?brain ? zn ?9.961574 > 29 ?2 ? wt ?brain ? zn 10.461621 > 30 ?3 ? wt ?brain ? zn ?9.873667 > 31 ?1 ?exp ?spine ? zn ?9.708067 > 32 ?2 ?exp ?spine ? zn 10.109309 > 33 ?3 ?exp ?spine ? zn 10.973387 > 34 ?1 ? wt ?spine ? zn ?8.406536 > 35 ?2 ? wt ?spine ? zn ?7.797746 > 36 ?3 ? wt ?spine ? zn 11.127984 > > # use tapply to aggregate >> tapply(elemconc$conc, elemconc[c('elem', 'region')], function(x) x) > ? ?region > elem brain ? ? spine > ?fe Numeric,6 Numeric,6 > ?cu Numeric,6 Numeric,6 > ?zn Numeric,6 Numeric,6 > > # check whether the order of data has been preserved after aggregation >> x['fe', 'brain'] > [[1]] > [1] ?8.497498 ?9.280944 ?9.726271 11.556397 10.992550 ?9.711200 > > # create an external factor for strain grouping >> tmpgeno = rep(c('exp', 'wt'), each=3) >> tmpgeno > [1] "exp" "exp" "exp" "wt" ?"wt" ?"wt" > > # do the t test using the grouping factor >> x = tapply(elemconc$conc, elemconc[c('elem', 'region')], function(x) t.test(x~tmpgeno) ) >> x > ? ?region > elem brain ?spine > ?fe List,9 List,9 > ?cu List,9 List,9 > ?zn List,9 List,9 > > I believe I have made no mistakes so far, but I wonder is there a > better way of doing this? > > > -- > Kaiyin Zhong > ------------------------------------------------------------------------------------------------------------------ > Neuroscience Research Institute, Peking University, Beijing, P.R.China 100038 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm