Hello Helpers, I have some problems with fitting the model for my data... -->my Literatur says (crawley testbook)Non-normality of errors-->I get a banana shape Q-Q plot with opening of banana downwards Structure of data: origin wt pes gender 1 wild 5.35 147.0 male 2 wild 5.90 148.0 male 3 wild 6.00 156.0 male 4 wild 7.50 157.0 male 5 wild 5.90 148.0 male 6 wild 5.95 148.0 male 7 wild 8.55 160.5 male 8 wild 5.90 148.0 male 9 wild 8.45 161.0 male 10 wild 4.90 147.0 male 11 wild 6.80 153.0 male 12 wild 5.75 146.0 male 13 wild 8.60 160.0 male 14 captive 6.85 159.0 male 15 captive 7.00 160.0 male 16 captive 6.80 155.0 male .. ... 283 site 4.10 130.4 female 284 site 3.55 131.1 female 285 site 4.20 135.7 female 286 site 3.45 128.0 female 287 site 3.65 125.3 female The goal of my analysis is to work out what effect the categorial factors(origin, gender) on the relation between log(wt)~log(pes)(-->Condition, fett ressource), have. Does the source(origin) of translocated animals have an affect on performance(condition)in the new area? I have already a best fit model and it looks quite good (or not?see below). two slopes(gender difference)and 6 intercepts(3origin levels*2gender levels) lm(formula = log(wt) ~ log(pes) + origin + gender + gender:log(pes)) Residuals: Min 1Q Median 3Q Max -0.54181 -0.07671 0.01520 0.09474 0.28818 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -7.39879 1.97605 -3.744 0.000219 *** log(pes) 1.78020 0.40118 4.437 1.31e-05 *** originsite 0.06572 0.01935 3.397 0.000781 *** originwild 0.07655 0.03552 2.155 0.032011 * gendermale -9.32418 2.37476 -3.926 0.000109 *** log(pes):gendermale 1.90393 0.47933 3.972 9.06e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1433 on 281 degrees of freedom Multiple R-Squared: 0.7227, Adjusted R-squared: 0.7177 F-statistic: 146.4 on 5 and 281 DF, p-value: < 2.2e-16 When plot this model I get a banana-shape in Normal Q-Q Plot(with open site pointing downwards) , indicating non-normality of my data....how to handle this? -->Do I have unbalanced data? captive site wild n--> 119 149 19 My problem is that I see that my data is not as good as the modelsummary tells. Should I include another term in my model formular? I think I have to differenciate more, but I don't know how.(contrasts?, TukeyHSD?,Akaike Information Criterion? or lme())to many different ways out there. Cheers, Tobi
Dear Tobias, Your observation that "When plot [the residuals from?] this model I get a banana-shape in Normal Q-Q Plot(with open site [side?] pointing downwards)," suggests that the residuals are negatively skewed, which in turn suggests that using log(wt) as the response variable may have been ill-advised. Perhaps simply using wt, or a weaker transformation such as sqrt(wt), would produce better-behaved residuals. I hope this helps, John ------------------------------ John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]On> Behalf Of Tobias Erik Reiners > Sent: May-04-08 5:56 AM > To: r-help at r-project.org > Subject: [R] Ancova_non-normality of errors > > Hello Helpers, > > I have some problems with fitting the model for my data... > -->my Literatur says (crawley testbook)> Non-normality of errors-->I get a banana shape Q-Q plot with opening > of banana downwards > > Structure of data: > origin wt pes gender > 1 wild 5.35 147.0 male > 2 wild 5.90 148.0 male > 3 wild 6.00 156.0 male > 4 wild 7.50 157.0 male > 5 wild 5.90 148.0 male > 6 wild 5.95 148.0 male > 7 wild 8.55 160.5 male > 8 wild 5.90 148.0 male > 9 wild 8.45 161.0 male > 10 wild 4.90 147.0 male > 11 wild 6.80 153.0 male > 12 wild 5.75 146.0 male > 13 wild 8.60 160.0 male > 14 captive 6.85 159.0 male > 15 captive 7.00 160.0 male > 16 captive 6.80 155.0 male > .. > ... > 283 site 4.10 130.4 female > 284 site 3.55 131.1 female > 285 site 4.20 135.7 female > 286 site 3.45 128.0 female > 287 site 3.65 125.3 female > > The goal of my analysis is to work out what effect the categorial > factors(origin, gender) on the relation between > log(wt)~log(pes)(-->Condition, fett ressource), have. > Does the source(origin) of translocated animals have an affect on > performance(condition)in the new area? > I have already a best fit model and it looks quite good (or not?seebelow).> > two slopes(gender difference)and 6 intercepts(3origin levels*2genderlevels)> > lm(formula = log(wt) ~ log(pes) + origin + gender + gender:log(pes)) > > Residuals: > Min 1Q Median 3Q Max > -0.54181 -0.07671 0.01520 0.09474 0.28818 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -7.39879 1.97605 -3.744 0.000219 *** > log(pes) 1.78020 0.40118 4.437 1.31e-05 *** > originsite 0.06572 0.01935 3.397 0.000781 *** > originwild 0.07655 0.03552 2.155 0.032011 * > gendermale -9.32418 2.37476 -3.926 0.000109 *** > log(pes):gendermale 1.90393 0.47933 3.972 9.06e-05 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 0.1433 on 281 degrees of freedom > Multiple R-Squared: 0.7227, Adjusted R-squared: 0.7177 > F-statistic: 146.4 on 5 and 281 DF, p-value: < 2.2e-16 > > When plot this model I get a banana-shape in Normal Q-Q Plot(with open > site pointing downwards) , indicating non-normality of my data....how > to handle this? > > -->Do I have unbalanced data? > captive site wild > n--> 119 149 19 > > My problem is that I see that my data is not as good as the > modelsummary tells. > Should I include another term in my model formular? > > I think I have to differenciate more, but I don't know > how.(contrasts?, TukeyHSD?,Akaike Information Criterion? or lme())to > many different ways out there. > > Cheers, > Tobi > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Hello Tobias, I am not sure what your wt variable is: I suspect a 'weight'. If it is a nonnegative measure, then you want a positive density model, not a normal density in the first place. I think you should try a Gamma GLM, and look at a Gamma qqplot within each of your conditions. You could try the following: M1 = glm(wt ~ pes + origin + gender + gender:pes, family=Gamma(link=identity)) M2 = glm(wt ~ pes + origin + gender + gender:pes, family=Gamma(link=log)) M3 = glm(wt ~ pes + origin + gender + gender:pes, family=Gamma(link=inverse)) and see whether one of them fit better, in terms of qqplot adjustment or comparative fit indicies (AIC, BIC,...). HTH, Yvonnick Noel, PhD University of Brittany France> Message: 1 > Date: Sun, 04 May 2008 11:56:09 +0200 > From: Tobias Erik Reiners <Tobias.Reiners at bio.uni-giessen.de> > Subject: [R] Ancova_non-normality of errors > To: r-help at r-project.org > Message-ID: <20080504115609.x1pgm2mgw0k0kooo at imap.stud.uni-giessen.de> > Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; > format="flowed" > > Hello Helpers, > > I have some problems with fitting the model for my data... > -->my Literatur says (crawley testbook)> Non-normality of errors-->I get a banana shape Q-Q plot with opening > of banana downwards > > The goal of my analysis is to work out what effect the categorial > factors(origin, gender) on the relation between > log(wt)~log(pes)(-->Condition, fett ressource), have. > Does the source(origin) of translocated animals have an affect on > performance(condition)in the new area? > I have already a best fit model and it looks quite good (or not?see below). > > two slopes(gender difference)and 6 intercepts(3origin levels*2gender levels) > > lm(formula = log(wt) ~ log(pes) + origin + gender + gender:log(pes))