Nikolaos Lampadariou
2008-Aug-17 23:11 UTC
[R] before-after control-impact analysis with R
Hello everybody, In am trying to analyse a BACI experiment and I really want to do it with R (which I find really exciting). So, before moving on I though it would be a good idea to repeat some known experiments which are quite similar to my own. I tried to reproduce 2 published examples but without much success. The first one in particular is a published dataset analysed with SAS by McDonald et al. (2000). They also provide the original data as well as the SAS code. I don't know much about SAS and i really want to stick to R. So here follow 3 questions based on these 2 papers. Paper 1 (McDonald et al. 2000. Analysis of count data from before-after control-impact studies. J. Agric. Biol. Envir Stat., 5:262-279) Data were collected from 1984, 1989, 1990, 1991, 1993, and 1996. Samples from 1984 are considered as before an impact and the remaining years as after the impact. Each year, 96 transects were sampled (36 in the oiled area and 60 in the non-oiled area; "0" is for oiled and "1" for non-oiled). The authors compare 3 different ways of analysing the data including glmm. The data can be reproduced with the following commands (density is fake numbers but I can provide the original data since I've typed them in anyway): >x<-c(rep("0",36),rep("1",60)) >oiled<-rep(x,6) >year<-c(rep(1984,96), rep(1989,96), rep(1990,96), rep(1991,96), rep(1993,96), rep(1996,96)) >density<-runif(576, min=0, max=10) >transect<-c(rep(1:96,6)) >oil<-data.frame("oiled"=oiled, "transect"=transect, "year"=year, "density"=density) Question 1: I can reproduce the results of the repeated measures anova with: >oil.aov1<-aov(density~factor(year)*factor(oiled)+Error(factor(transect)) But why is the following command not working? >oil.aov2<-aov(density~oiled*year + Error(oiled/transect), data=oil) After reading the R-help archive, as well as Chambers and Hasties (Statistical Models in S) and Pinheiro's and Bates (Mixed effects models in S and S-plus) I would expect that the correct model is the oil.aov2. As you might see from the data, transect is nested within oiled and I still get the wrong results when I try +Error(transect) or +Error(oiled:transect) and many other variants. Question 2 (on the same paper): The authors conclude that it is better to analyse the data with a Generalized Linear (Mixed) Model Technique. I tried lme and after reading Douglas Bates article (May 2005, vol. 5/1) also lmer as follows: >oil.lme<-lme(density~year*oiled, random=~1|oiled/transect) or >harvest.lmer<-lmer(abund~H*BA+BA/year+BA/year:H+site:BA+(1|H/site)) but again no luck. I will get always some error messages or some interactions missing. Paper 2 (Keough & Quinn, 2000. Legislative vs. practical protection of an intertidal shoreline in southeastern Australia. Ecol. Appl. 10: 871-881) Data were collected from 1989, 1990, 1991, 1993, 1994, 1995, 1996, 1997). the 1989-1991 are the before impact years and the other 4 years are the after the impact years. Eight sites were samples (2 protected and 6 impacted). In this dataset, site is nested within harvest (H) and year is nested within before-after (BA). Also, site is considered as random by the authors. The data (fake again) can be produced with the following commands: >site<-c(rep(c("A1","A2", "RR1", "RR2", "WT1", "WT2", "WT3", "WT4"),8)) >H<-c(rep(c("exp", "exp", "prot", "pro", "exp", "exp", "exp", "exp"), 8)) >year<-c(rep(1989,8), rep(1990,8), rep(1991,8), rep(1993,8), rep(1994,8), rep(1995,8), rep(1996,8), rep(1997,8)) >BA<-c(rep("bef",24), rep("after",40)) >abund<-runif(64, min=0, max=10) >harvest<-data.frame(abund, BA, H, site, year) Question 3. The authors use a complex anova design and here is part of their anova table which shows the design and the df. source.of.var df Harvesting(H) 1, 6 before-after(BA) 1, 6 H x BA 1, 6 Site(H) 6, 31 Year(BA) 6, 31 Site x BA 6, 31 Year x H 6, 31 Resid. 31 My question again is: Why can't I reproduce the results? When I try a simple anova without any random factors: >harvest.lm<-lm(abund~H*BA+H/site+BA/year+site:BA+year:H) I get close enought but the results are different (at least I think they are different because the nomin. df are different). So obviously I need to assign sites as a random factor somehow. So when I try to include site (which is nested in H) as a random factor and also year nested in BA (as the authors did) the best I can come up with is: >harvest.lme<-lme(abund~H*BA+BA/year+BA/year:H, random=~1|H/site) But I get a warning message (Warning message:In pf(q, df1, df2, lower.tail, log.p) : NaNs produced) and also I don't know where to put the site x BA term (whatever I try I get error messages). I really apologise for the long post but after a week of studying and trying as many ideas and examples I could find and think of I felt that I really need advice from some more experienced users if I really want to use this magnificent tool correctly. Thanks in advance -- ------------------------------------------------------- Nikolaos Lampadariou Hellenic Centre for Marine Research P.O. Box 2214 710 03 Heraklion Crete GREECE e-mail: nlamp at her.hcmr.gr Ph. +30 281 0337849, +30 281 0337806 FAX +30 281 0337822 Web site: http://www.hcmr.gr/
Hi Nikolaos,>> My question again is: Why can't I reproduce the results? When I try a >> simple anova without any random factors:Lack of a "right" result probably has to do with the type of analysis of variance that is being done. The default in R is to use so-called Type I tests, for good reason. SAS, I think, still uses a type of test that many authorities consider to be meaningless, as a general, first-level, test. There is a long, long thread on this, going back to about approx April/May 1999, when a suitable Ripplyism was coined, which I believe found its way into the fortunes package. But RSiteSearch("type III") should do for a start. Also see ?drop1 library(car) ?Anova HTH, Mark. Nikolaos Lampadariou wrote:> > Hello everybody, > > In am trying to analyse a BACI experiment and I really want to do it > with R (which I find really exciting). So, before moving on I though it > would be a good idea to repeat some known experiments which are quite > similar to my own. I tried to reproduce 2 published examples but without > much success. The first one in particular is a published dataset > analysed with SAS by McDonald et al. (2000). They also provide the > original data as well as the SAS code. I don't know much about SAS and i > really want to stick to R. So here follow 3 questions based on these 2 > papers. > > > Paper 1 > (McDonald et al. 2000. Analysis of count data from before-after > control-impact studies. J. Agric. Biol. Envir Stat., 5:262-279) > > Data were collected from 1984, 1989, 1990, 1991, 1993, and 1996. Samples > from 1984 are considered as before an impact and the remaining years as > after the impact. Each year, 96 transects were sampled (36 in the > oiled area and 60 in the non-oiled area; "0" is for oiled and "1" for > non-oiled). The authors compare 3 different ways of analysing the data > including glmm. The data can be reproduced with the following commands > (density is fake numbers but I can provide the original data since I've > typed them in anyway): > > >x<-c(rep("0",36),rep("1",60)) > >oiled<-rep(x,6) > >year<-c(rep(1984,96), rep(1989,96), rep(1990,96), rep(1991,96), > rep(1993,96), rep(1996,96)) > >density<-runif(576, min=0, max=10) > >transect<-c(rep(1:96,6)) > >oil<-data.frame("oiled"=oiled, "transect"=transect, "year"=year, > "density"=density) > > Question 1: > I can reproduce the results of the repeated measures anova with: > >oil.aov1<-aov(density~factor(year)*factor(oiled)+Error(factor(transect)) > > But why is the following command not working? > >oil.aov2<-aov(density~oiled*year + Error(oiled/transect), data=oil) > > After reading the R-help archive, as well as Chambers and Hasties > (Statistical Models in S) and Pinheiro's and Bates (Mixed effects models > in S and S-plus) I would expect that the correct model is the oil.aov2. > As you might see from the data, transect is nested within oiled and I > still get the wrong results when I try +Error(transect) or > +Error(oiled:transect) and many other variants. > > Question 2 (on the same paper): > The authors conclude that it is better to analyse the data with a > Generalized Linear (Mixed) Model Technique. I tried lme and after > reading Douglas Bates article (May 2005, vol. 5/1) also lmer as follows: > >oil.lme<-lme(density~year*oiled, random=~1|oiled/transect) > or > >harvest.lmer<-lmer(abund~H*BA+BA/year+BA/year:H+site:BA+(1|H/site)) > but again no luck. I will get always some error messages or some > interactions missing. > > > Paper 2 > (Keough & Quinn, 2000. Legislative vs. practical protection of an > intertidal shoreline in southeastern Australia. Ecol. Appl. 10: 871-881) > > Data were collected from 1989, 1990, 1991, 1993, 1994, 1995, 1996, > 1997). the 1989-1991 are the before impact years and the other 4 years > are the after the impact years. Eight sites were samples (2 protected > and 6 impacted). In this dataset, site is nested within harvest (H) and > year is nested within before-after (BA). Also, site is considered as > random by the authors. The data (fake again) can be produced with the > following commands: > > >site<-c(rep(c("A1","A2", "RR1", "RR2", "WT1", "WT2", "WT3", "WT4"),8)) > >H<-c(rep(c("exp", "exp", "prot", "pro", "exp", "exp", "exp", "exp"), 8)) > >year<-c(rep(1989,8), rep(1990,8), rep(1991,8), rep(1993,8), > rep(1994,8), rep(1995,8), rep(1996,8), rep(1997,8)) > >BA<-c(rep("bef",24), rep("after",40)) > >abund<-runif(64, min=0, max=10) > >harvest<-data.frame(abund, BA, H, site, year) > > Question 3. > The authors use a complex anova design and here is part of their anova > table which shows the design and the df. > > source.of.var df > Harvesting(H) 1, 6 > before-after(BA) 1, 6 > H x BA 1, 6 > Site(H) 6, 31 > Year(BA) 6, 31 > Site x BA 6, 31 > Year x H 6, 31 > Resid. 31 > > > My question again is: Why can't I reproduce the results? When I try a > simple anova without any random factors: > >harvest.lm<-lm(abund~H*BA+H/site+BA/year+site:BA+year:H) > I get close enought but the results are different (at least I think they > are different because the nomin. df are different). > > So obviously I need to assign sites as a random factor somehow. So when > I try to include site (which is nested in H) as a random factor and also > year nested in BA (as the authors did) the best I can come up with is: > >harvest.lme<-lme(abund~H*BA+BA/year+BA/year:H, random=~1|H/site) > But I get a warning message (Warning message:In pf(q, df1, df2, > lower.tail, log.p) : NaNs produced) and also I don't know where to put > the site x BA term (whatever I try I get error messages). > > I really apologise for the long post but after a week of studying and > trying as many ideas and examples I could find and think of I felt that > I really need advice from some more experienced users if I really want > to use this magnificent tool correctly. > > Thanks in advance > > -- > ------------------------------------------------------- > Nikolaos Lampadariou > Hellenic Centre for Marine Research > P.O. Box 2214 > 710 03 Heraklion Crete > GREECE > > e-mail: nlamp at her.hcmr.gr > Ph. +30 281 0337849, +30 281 0337806 > FAX +30 281 0337822 > Web site: http://www.hcmr.gr/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/before-after-control-impact-analysis-with-R-tp19024219p19027892.html Sent from the R help mailing list archive at Nabble.com.