Dear Helpers, I just started working with R and I'm a bit overloaded with information. My data is from marsupials reindroduced in a area. I have weight(wt), hind foot lenghts(pes) as continues variables and origin and gender as categorial. condition is just the residuals i took from the model.> names(dat1)[1] "wt" "pes" "origin" "gender" "condition" my model after model simplification so far: model1<-lm(log(wt)~log(pes)+origin+gender+gender:log(pes)) -->six intercepts and two slopes the problem is i have some things I can't include in my analysis: 1.Very different sample sizes for each of the treatments> tapply(log(wt),origin,length)captive site wild 119 149 19 2.Substantial differences in the range of values taken by the covariate (leg length) between treatments> tapply(pes,origin,var)captive site wild 82.43601 71.44442 60.42544> tapply(pes,origin,mean)captive site wild 147.3261 144.8698 148.2895 4.Outliers 5.Poorly behaved residuals thanks for the answer I am open minded to any different kind of analysis. Tobi
Tobi, I think that it would be easier to provide advice if you were more explicit on what the model will be used for, and what is the structure of the data. Is there only one measurement for each marsupial? Is the goal to a) produce a model to predict marsupial weight given other variables, and if so, why, b) produce a model to estimate the effect of introduction on weight, with the other variables being nuisance variables, and if so, why, c) something else (and why) because all these factors affect the position that you could adopt about the questions that you have. Andrew On Sun, May 04, 2008 at 04:00:16AM +0200, Tobias Erik Reiners wrote:> Dear Helpers, > > I just started working with R and I'm a bit overloaded with information. > > My data is from marsupials reindroduced in a area. I have weight(wt), > hind foot > lenghts(pes) as continues variables and origin and gender as categorial. > condition is just the residuals i took from the model. > > >names(dat1) > [1] "wt" "pes" "origin" "gender" "condition" > > my model after model simplification so far: > model1<-lm(log(wt)~log(pes)+origin+gender+gender:log(pes)) > -->six intercepts and two slopes > > the problem is i have some things I can't include in my analysis: > 1.Very different sample sizes for each of the treatments > >tapply(log(wt),origin,length) > captive site wild > 119 149 19 > 2.Substantial differences in the range of values taken by the > covariate (leg length) between treatments > >tapply(pes,origin,var) > captive site wild > 82.43601 71.44442 60.42544 > >tapply(pes,origin,mean) > captive site wild > 147.3261 144.8698 148.2895 > > 4.Outliers > 5.Poorly behaved residuals > > thanks for the answer I am open minded to any different kind of analysis. > > Tobi > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/
For points 4 and 5, you could use a robust linear fit. One way to do that is to use rlm() from package MASS, which is used in several examples in the book that package MASS supports. On Sun, 4 May 2008, Tobias Erik Reiners wrote:> Dear Helpers, > > I just started working with R and I'm a bit overloaded with information. > > My data is from marsupials reindroduced in a area. I have weight(wt), hind > foot > lenghts(pes) as continues variables and origin and gender as categorial. > condition is just the residuals i took from the model. > >> names(dat1) > [1] "wt" "pes" "origin" "gender" "condition" > > my model after model simplification so far: > model1<-lm(log(wt)~log(pes)+origin+gender+gender:log(pes)) > -->six intercepts and two slopes > > the problem is i have some things I can't include in my analysis: > 1.Very different sample sizes for each of the treatments >> tapply(log(wt),origin,length) > captive site wild > 119 149 19 > 2.Substantial differences in the range of values taken by the covariate (leg > length) between treatments >> tapply(pes,origin,var) > captive site wild > 82.43601 71.44442 60.42544 >> tapply(pes,origin,mean) > captive site wild > 147.3261 144.8698 148.2895 > > 4.Outliers > 5.Poorly behaved residuals > > thanks for the answer I am open minded to any different kind of analysis. > > Tobi > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi Tobias, If you want to do inferential statistics with groups differing systematically on the covariate, you will need to be extra careful in your interpretation. See, e.g., Miller, G. A. & Chapman, J. P. Misunderstanding Analysis of Covariance, Journal of Abnormal Psychology, 2001, 110, 40-48, and a lot of other similar things. That said, with your wide variation in pes you may want to consider restricted cubic splines ("natural splines") in Frank Harrell's Hmisc and Design packages. At least, it would be interesting to test whether the influence of pes really is linear, which can be done easily with splines. See also Harrell, F. E. Regression Modeling Strategies, Springer, 2001. Good luck with your small furry creatures! Stephan Tobias Erik Reiners schrieb:> Dear Helpers, > > I just started working with R and I'm a bit overloaded with information. > > My data is from marsupials reindroduced in a area. I have weight(wt), > hind foot > lenghts(pes) as continues variables and origin and gender as categorial. > condition is just the residuals i took from the model. > >> names(dat1) > [1] "wt" "pes" "origin" "gender" "condition" > > my model after model simplification so far: > model1<-lm(log(wt)~log(pes)+origin+gender+gender:log(pes)) > -->six intercepts and two slopes > > the problem is i have some things I can't include in my analysis: > 1.Very different sample sizes for each of the treatments >> tapply(log(wt),origin,length) > captive site wild > 119 149 19 > 2.Substantial differences in the range of values taken by the covariate > (leg length) between treatments >> tapply(pes,origin,var) > captive site wild > 82.43601 71.44442 60.42544 >> tapply(pes,origin,mean) > captive site wild > 147.3261 144.8698 148.2895 > > 4.Outliers > 5.Poorly behaved residuals > > thanks for the answer I am open minded to any different kind of analysis. > > Tobi > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
On Sat, May 3, 2008 at 9:00 PM, Tobias Erik Reiners <Tobias.Reiners at bio.uni-giessen.de> wrote:> Dear Helpers, > > I just started working with R and I'm a bit overloaded with information. > > My data is from marsupials reindroduced in a area. I have weight(wt), hind > foot > lenghts(pes) as continues variables and origin and gender as categorial. > condition is just the residuals i took from the model. > > > > names(dat1) > > > [1] "wt" "pes" "origin" "gender" "condition" > > my model after model simplification so far: > model1<-lm(log(wt)~log(pes)+origin+gender+gender:log(pes)) > -->six intercepts and two slopes > > the problem is i have some things I can't include in my analysis: > 1.Very different sample sizes for each of the treatments > > > tapply(log(wt),origin,length) > > > captive site wild > 119 149 19 > 2.Substantial differences in the range of values taken by the covariate > (leg length) between treatments > > > tapply(pes,origin,var) > > > captive site wild > 82.43601 71.44442 60.42544 > > > tapply(pes,origin,mean) > > > captive site wild > 147.3261 144.8698 148.2895 > > 4.Outliers > 5.Poorly behaved residuals > > thanks for the answer I am open minded to any different kind of analysis.How about starting with some graphics? e.g. with ggplot2 the following would give you some clues as to whether your models are appropriate or not: qplot(pes, wt, data=dat1, colour=gender, facets = . ~ origin, log="xy") + geom_smooth(method=lm) qplot(pes, wt, data=dat1, facets = gender ~ origin, log="xy") + geom_smooth(method=lm) If you wanted to the see the effect of a robust fit, as suggested by Brian Ripley, replace lm with rlm. Hadley -- http://had.co.nz/