Simple example of 5 groups of 4 replicates.>set.seed(5)>tmp <- rnorm(20)>gp <- as.factor(rep(1:5,each=4))>summary(glm(tmp ~ -1 + gp, data=data.frame(tmp, gp)))$coefficients Estimate Std. Error t value Pr(>|t|)gp1 -0.1604613084 0.4899868 -0.3274809061 0.7478301gp2 0.0002487984 0.4899868 0.0005077655 0.9996016gp3 0.0695463698 0.4899868 0.1419352018 0.8890200gp4 -0.6121682841 0.4899868 -1.2493567852 0.2306791gp5 -0.6999545014 0.4899868 -1.4285171713 0.1736348>m <- data.frame(tmp, gp) >sapply(gp, function(x) sd(m[m[,"gp"]==x,1])) [1] 1.169284 1.169284 1.169284 1.169284 1.142974 1.142974 1.142974 1.142974 [9] 0.862423 0.862423 0.862423 0.862423 0.535740 0.535740 0.535740 0.535740[17] 1.047538 1.047538 1.047538 1.047538Why doesn't the standard deviation of each group correlates with the Pr e.g., gp = 4 has the smallest sd of 0.535740, but its Pr is not the lowest (i.e., only 0.23 vs 0.1736 of gp = 5). Another example with new tmp1>tmp1[1] 9.577969 9.310792 9.666767 9.610164 10.181692 10.155899 10.025943 [8] 9.971243 10.177766 9.265793 9.415818 10.099874 10.238829 9.575591[15] 9.560879 9.617891 9.617891 10.158160 10.592377 10.068443>summary(glm(tmp1 ~ -1 + age, data=data.frame(as.vector(as.matrix(tmp1)), age)))$coefficients Estimate Std. Error t value Pr(>|t|)age1 9.541423 0.1611603 59.20456 3.380085e-19age2 10.083694 0.1611603 62.56935 1.479781e-19age3 9.739813 0.1611603 60.43557 2.485380e-19age4 9.748297 0.1611603 60.48821 2.453251e-19age5 10.109218 0.1611603 62.72773 1.424913e-19m1 <- data.frame(tmp1, gp)>sapply(age, function(x) sd(m1[m1[,"age"]==x,1])) [1] 0.1580745 0.1580745 0.1580745 0.1580745 0.1013207 0.1013207 0.1013207 [8] 0.1013207 0.4658736 0.4658736 0.4658736 0.4658736 0.3279128 0.3279128[15] 0.3279128 0.3279128 0.3995426 0.3995426 0.3995426 0.3995426Can I conclude from the Pr of summary that tmp1 are of better "quality" than tmp, given that its Pr. values are signficantly smaller ? _________________________________________________________________ [[alternative HTML version deleted]]
Bill.Venables at csiro.au
2008-Aug-20 07:03 UTC
[R] Understanding output of summary(glm(...))
The 'Std. Error' values listed in the coefficients table of the summary have nothing to do with the sub-class standard deviations. They are the standard errors associated with the estimates of the class means (the way you have fitted the model) and as the design has equal replication and the estimated standard errors are based on the pooled estimate of variance from all samples, they are equal. That's why. Your second 'example' was incomplete and I couldn't follow it, but the answer is almost certainly "hell no!". Finally, a question for you. Why do you use glm(...) when all you are doing is fitting linear models? Either lm(...) or aov(...) would have been much more sensible. Bill Venables http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daren Tan Sent: Wednesday, 20 August 2008 4:37 PM To: r-help at stat.math.ethz.ch Subject: [R] Understanding output of summary(glm(...)) Simple example of 5 groups of 4 replicates.>set.seed(5)>tmp <- rnorm(20)>gp <- as.factor(rep(1:5,each=4))>summary(glm(tmp ~ -1 + gp, data=data.frame(tmp, gp)))$coefficientsEstimate Std. Error t value Pr(>|t|)gp1 -0.1604613084 0.4899868 -0.3274809061 0.7478301gp2 0.0002487984 0.4899868 0.0005077655 0.9996016gp3 0.0695463698 0.4899868 0.1419352018 0.8890200gp4 -0.6121682841 0.4899868 -1.2493567852 0.2306791gp5 -0.6999545014 0.4899868 -1.4285171713 0.1736348>m <- data.frame(tmp, gp) >sapply(gp, function(x) sd(m[m[,"gp"]==x,1])) [1] 1.169284 1.1692841.169284 1.169284 1.142974 1.142974 1.142974 1.142974 [9] 0.862423 0.862423 0.862423 0.862423 0.535740 0.535740 0.535740 0.535740[17] 1.047538 1.047538 1.047538 1.047538 Why doesn't the standard deviation of each group correlates with the Pr e.g., gp = 4 has the smallest sd of 0.535740, but its Pr is not the lowest (i.e., only 0.23 vs 0.1736 of gp = 5). Another example with new tmp1>tmp1[1] 9.577969 9.310792 9.666767 9.610164 10.181692 10.155899 10.025943 [8] 9.971243 10.177766 9.265793 9.415818 10.099874 10.238829 9.575591[15] 9.560879 9.617891 9.617891 10.158160 10.592377 10.068443>summary(glm(tmp1 ~ -1 + age,data=data.frame(as.vector(as.matrix(tmp1)), age)))$coefficients Estimate Std. Error t value Pr(>|t|)age1 9.541423 0.1611603 59.20456 3.380085e-19age2 10.083694 0.1611603 62.56935 1.479781e-19age3 9.739813 0.1611603 60.43557 2.485380e-19age4 9.748297 0.1611603 60.48821 2.453251e-19age5 10.109218 0.1611603 62.72773 1.424913e-19 m1 <- data.frame(tmp1, gp)>sapply(age, function(x) sd(m1[m1[,"age"]==x,1])) [1] 0.15807450.1580745 0.1580745 0.1580745 0.1013207 0.1013207 0.1013207 [8] 0.1013207 0.4658736 0.4658736 0.4658736 0.4658736 0.3279128 0.3279128[15] 0.3279128 0.3279128 0.3995426 0.3995426 0.3995426 0.3995426 Can I conclude from the Pr of summary that tmp1 are of better "quality" than tmp, given that its Pr. values are signficantly smaller ? _________________________________________________________________ [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.