Forgive what may seem to be a trivial question/problem. Below is some simple R 1.2.1(Windows) code with output. > summary(mammals, digits=10) Name Body.Weight Brain.Weight Red Fox :1 Min. : 3.0 Min. : 26.0 Pig :1 1st Qu.: 35.5 1st Qu.: 138.5 Man :1 Median : 100.0 Median : 406.0 Kangaroo:1 Mean : 761.2 Mean :1000.0 Jaguar :1 3rd Qu.: 493.0 3rd Qu.: 667.5 Horse :1 Max. :6654.0 Max. :5712.0 (Other) :9 > mean(mammals[,3]) [1] 1000.467 # <---summary() reports it as 1000.0 > mean(mammals[,2]) [1] 761.2 # <- summary() reports it as 761.2 I'm puzzled why the Brain.Weight mean from summary() is different from mean(mammals[,3]), while the Body.Weight means are identical in the two functions. This isn't limited to R; I've observed the same thing in S-Plus 2000 (and v.6 beta). I can get the "right" answer in S-Plus using the digits argument (setting digits=8), but this argument doesn't seem to have any effect in R 1.2.1. I *did* use it the way it is illustrated in the help file as well (e.g. summary(mammals, digits=max(10, getOption("digits"))) ) with the same results as above. So, I guess I have two questions: 1) Why does S (in both S-Plus and R 1.2.1) produce different values for the means in the second variable but not the first? 2) Why does the digits argument seem not to have any effect in R 1.2.1's summary()? P.S. I also pasted the example code from the summary help file into the R 1.2.1 window. The digits argument doesn't change the results there either. Dr. Marc R. Feldesman email: feldesmanm at pdx.edu email: feldesman at attglobal.net fax: 503-725-3905 "Don't know where I'm going. Don't like where I've been. There may be no exit. But hell, I'm going in." Jimmy Buffett Powered by Superchoerus - the 700 MHz Coppermine Box -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Wed, 31 Jan 2001, Marc R. Feldesman wrote:> Forgive what may seem to be a trivial question/problem. > > Below is some simple R 1.2.1(Windows) code with output. > > > summary(mammals, digits=10) > Name Body.Weight Brain.Weight > Red Fox :1 Min. : 3.0 Min. : 26.0 > Pig :1 1st Qu.: 35.5 1st Qu.: 138.5 > Man :1 Median : 100.0 Median : 406.0 > Kangaroo:1 Mean : 761.2 Mean :1000.0 > Jaguar :1 3rd Qu.: 493.0 3rd Qu.: 667.5 > Horse :1 Max. :6654.0 Max. :5712.0 > (Other) :9 > > mean(mammals[,3]) > [1] 1000.467 # <---summary() reports it as 1000.0 > > mean(mammals[,2]) > [1] 761.2 # <- summary() reports it as 761.2 > > I'm puzzled why the Brain.Weight mean from summary() is different from > mean(mammals[,3]), while the Body.Weight means are identical in the two > functions. This isn't limited to R; I've observed the same thing in S-Plus > 2000 (and v.6 beta).The results are to a certain number of significant figures, not decimal places.> I can get the "right" answer in S-Plus using the digits argument (setting > digits=8), but this argument doesn't seem to have any effect in R 1.2.1. I > *did* use it the way it is illustrated in the help file as well (e.g. > > summary(mammals, digits=max(10, getOption("digits"))) > ) > with the same results as above. > > So, I guess I have two questions: > > 1) Why does S (in both S-Plus and R 1.2.1) produce different values for > the means in the second variable but not the first?summary.default uses signif on the results, to by default 4 digits.> 2) Why does the digits argument seem not to have any effect in R 1.2.1's > summary()?Because R forgot to pass it down to summary.default.> P.S. I also pasted the example code from the summary help file into the R > 1.2.1 window. The digits argument doesn't change the results there either.In R and summary.data.frame, digits is only used in formatting the result. Replace z <- lapply(as.list(object), summary, maxsum = maxsum) by z <- lapply(as.list(object), summary, maxsum = maxsum, digits = digits) in R. Brian -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Below is output after fixing summary.data.frame as you suggest. This output now matches that in SPlus 2000 and SPlus 6.0 (Win Beta 2). However, in light of the issue of significant digits, there still seems to be an inconsistency here (both in R and S dialects). All the values for body weight print (?have) one decimal digit, while all the values for brain weight print (?have) 4. Since all the original values in the data file are recorded without decimal digits at all, I find it strange that the (for example) minimum for body weight is 3.0, while the minimum for brain weight is 26.0000. They're 3 and 26, respectively, in the original data file. Why should one be reported to one decimal digit and the other to 4? This pattern follows throughout. I don't think this is an R problem since a similar pattern (but with 3 decimal digits) occurs in S-Plus. S-Plus 6.0 (Win, Beta) OUTPUT > summary(mammals, digits=8) Body.Weight Brain.Weight Min.: 3.0 Min.: 26.000 1st Qu.: 35.5 1st Qu.: 138.500 Median: 100.0 Median: 406.000 Mean: 761.2 Mean:1000.467 3rd Qu.: 493.0 3rd Qu.: 667.500 Max.:6654.0 Max.:5712.000 > Obviously this isn't a giant problem, but one that a student first brought to my attention and I've been scratching my head trying to puzzle it out ever since. R-1.2.1 (Windows, Binary after Brian Ripley's code fix) > summary(mammals, digits=8) Name Body.Weight Brain.Weight Red Fox :1 Min. : 3.0 Min. : 26.0000 Pig :1 1st Qu.: 35.5 1st Qu.: 138.5000 Man :1 Median : 100.0 Median : 406.0000 Kangaroo:1 Mean : 761.2 Mean :1000.4667 Jaguar :1 3rd Qu.: 493.0 3rd Qu.: 667.5000 Horse :1 Max. :6654.0 Max. :5712.0000 (Other) :9 At 08:04 AM 2/1/01 +0000, Prof Brian D Ripley wrote: >On Wed, 31 Jan 2001, Marc R. Feldesman wrote: > >> Forgive what may seem to be a trivial question/problem. >> >> Below is some simple R 1.2.1(Windows) code with output. >> >> > summary(mammals, digits=10) >> Name Body.Weight Brain.Weight >> Red Fox :1 Min. : 3.0 Min. : 26.0 >> Pig :1 1st Qu.: 35.5 1st Qu.: 138.5 >> Man :1 Median : 100.0 Median : 406.0 >> Kangaroo:1 Mean : 761.2 Mean :1000.0 >> Jaguar :1 3rd Qu.: 493.0 3rd Qu.: 667.5 >> Horse :1 Max. :6654.0 Max. :5712.0 >> (Other) :9 >> > mean(mammals[,3]) >> [1] 1000.467 # <---summary() reports it as 1000.0 >> > mean(mammals[,2]) >> [1] 761.2 # <- summary() reports it as 761.2 >> >> I'm puzzled why the Brain.Weight mean from summary() is different from >> mean(mammals[,3]), while the Body.Weight means are identical in the two >> functions. This isn't limited to R; I've observed the same thing in S-Plus >> 2000 (and v.6 beta). > >The results are to a certain number of significant figures, not decimal >places. > >> I can get the "right" answer in S-Plus using the digits argument (setting >> digits=8), but this argument doesn't seem to have any effect in R 1.2.1. I >> *did* use it the way it is illustrated in the help file as well (e.g. >> >> summary(mammals, digits=max(10, getOption("digits"))) >> ) >> with the same results as above. >> >> So, I guess I have two questions: >> >> 1) Why does S (in both S-Plus and R 1.2.1) produce different values for >> the means in the second variable but not the first? > >summary.default uses signif on the results, to by default 4 digits. > >> 2) Why does the digits argument seem not to have any effect in R 1.2.1's >> summary()? > >Because R forgot to pass it down to summary.default. > >> P.S. I also pasted the example code from the summary help file into the R >> 1.2.1 window. The digits argument doesn't change the results there either. > >In R and summary.data.frame, digits is only used in formatting the result. > >Replace > > z <- lapply(as.list(object), summary, maxsum = maxsum) > >by > > z <- lapply(as.list(object), summary, maxsum = maxsum, digits = digits) > >in R. > >Brian > > >-- >Brian D. Ripley, ripley at stats.ox.ac.uk >Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >University of Oxford, Tel: +44 1865 272861 (self) >1 South Parks Road, +44 1865 272860 (secr) >Oxford OX1 3TG, UK Fax: +44 1865 272595 Dr. Marc R. Feldesman email: feldesmanm at pdx.edu email: feldesman at attglobal.net fax: 503-725-3905 "Don't know where I'm going. Don't like where I've been. There may be no exit. But hell, I'm going in." Jimmy Buffett Powered by Superchoerus - the 700 MHz Coppermine Box -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> Date: Thu, 01 Feb 2001 09:47:02 -0800 > To: Prof Brian D Ripley <ripley at stats.ox.ac.uk> > From: "Marc R. Feldesman" <feldesmanm at pdx.edu> > Subject: Re: [R] summary() vs mean() > Cc: <r-help at stat.math.ethz.ch> > > Below is output after fixing summary.data.frame as you suggest. This > output now matches that in SPlus 2000 and SPlus 6.0 (Win Beta 2). > > However, in light of the issue of significant digits, there still seems to > be an inconsistency here (both in R and S dialects). All the values for > body weight print (?have) one decimal digit, while all the values for brain > weight print (?have) 4. Since all the original values in the data file are > recorded without decimal digits at all, I find it strange that the (for > example) minimum for body weight is 3.0, while the minimum for brain weight > is 26.0000. They're 3 and 26, respectively, in the original data > file. Why should one be reported to one decimal digit and the other to > 4? This pattern follows throughout.[The fix I committed is different (passing digits=12, in fact).] The issue is that the whole column is printed to 8 sf (which you asked for but got 7 in S+6.0) and all the numbers are aligned on the decimal point. The column is formatted, not each number, which is what one usually wants. As the mean really is (I presume since there are 15 animals) 761.2, only one dp is needed to give 8 sf.> I don't think this is an R problem since a similar pattern (but with 3 > decimal digits) occurs in S-Plus. > > > S-Plus 6.0 (Win, Beta) OUTPUT > > summary(mammals, digits=8) > Body.Weight Brain.Weight > Min.: 3.0 Min.: 26.000 > 1st Qu.: 35.5 1st Qu.: 138.500 > Median: 100.0 Median: 406.000 > Mean: 761.2 Mean:1000.467 > 3rd Qu.: 493.0 3rd Qu.: 667.500 > Max.:6654.0 Max.:5712.000 > > > > Obviously this isn't a giant problem, but one that a student first brought > to my attention and I've been scratching my head trying to puzzle it out > ever since. > > R-1.2.1 (Windows, Binary after Brian Ripley's code fix) > > > summary(mammals, digits=8) > Name Body.Weight Brain.Weight > Red Fox :1 Min. : 3.0 Min. : 26.0000 > Pig :1 1st Qu.: 35.5 1st Qu.: 138.5000 > Man :1 Median : 100.0 Median : 406.0000 > Kangaroo:1 Mean : 761.2 Mean :1000.4667 > Jaguar :1 3rd Qu.: 493.0 3rd Qu.: 667.5000 > Horse :1 Max. :6654.0 Max. :5712.0000 > (Other) :9[...] -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._