I've just started using R and am still a neophyte, but I found the following curious result. I'm using the current version of R (2.5.1 (2007-06-27) ). Why are the results for the third quartile different in the output from the summary and fivenum commands? For the following data set 457 514 530 530 538 560 687 745 745 778 786 790 792 821 821 822 822 828 845 850 886 886 886 913 1050 1050 1065 1065 1065 1065 1090 1130 Summary yields: Min. 1st Qu. Median Mean 3rd Qu. Max. 457.0 745.0 822.0 825.4 947.2 1130.0 While fivenum yields: [1] 457.0 745.0 822.0 981.5 1130.0 The third quartile is being correctly calculated in the fivenum command and incorrectly in the summary command. Bob +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Robert L. Schaefer, Professor of Statistics Department of Mathematics and Statistics Miami University Oxford, Ohio 45056 (513) 529-3533 (513) 529-5818 (sec) SchaefRL at MUOhio.Edu HTTP://WWW.USERS.MUOHIO.EDU/SchaefRL +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Schaefer, Robert L. Dr. <schaefrl <at> muohio.edu> writes:> I've just started using R and am still a neophyte, but I found thefollowing curious result. I'm using the> current version of R (2.5.1 (2007-06-27) ). > > Why are the results for the third quartile different in the outputfrom the summary and fivenum commands?> For the following data set > > 457 514 530 530 538 560 687 745 745778 786 790 792 821 821 822 822> 828 845 850 886 886 886 913 1050 10501065 1065 1065 1065 1090 1130> > Summary yields: > > Min. 1st Qu. Median Mean 3rd Qu. Max. > 457.0 745.0 822.0 825.4 947.2 1130.0 > > While fivenum yields: > > [1] 457.0 745.0 822.0 981.5 1130.0 > > The third quartile is being correctly calculated in thefivenum command and incorrectly in the summary command.> > BobIf you look in ?boxplot.stats, it says: The two ?hinges? are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Where the quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise. I got here by looking a summary.default and seeing that it uses the quantile function and then looking at fivenum to see that it did not. Looking at the help for fivenum led me to boxplot.stats where I was that it w as not necessarily doing the same thing. HTH -- Ken Knoblauch Inserm U846 Institut Cellule Souche et Cerveau D?partement Neurosciences Int?gratives 18 avenue du Doyen L?pine 69500 Bron France
Please read the relevant help pages: ?fivenum ?boxplot.stats Hint: Length of your data vector is an "even" number. Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan at jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ---------------------------------------------------------------------------- -------- -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Schaefer, Robert L. Dr. Sent: Tuesday, October 09, 2007 11:20 AM To: r-help at stat.math.ethz.ch Subject: [R] Summary vs fivenum results for Q3 I've just started using R and am still a neophyte, but I found the following curious result. I'm using the current version of R (2.5.1 (2007-06-27) ). Why are the results for the third quartile different in the output from the summary and fivenum commands? For the following data set 457 514 530 530 538 560 687 745 745 778 786 790 792 821 821 822 822 828 845 850 886 886 886 913 1050 1050 1065 1065 1065 1065 1090 1130 Summary yields: Min. 1st Qu. Median Mean 3rd Qu. Max. 457.0 745.0 822.0 825.4 947.2 1130.0 While fivenum yields: [1] 457.0 745.0 822.0 981.5 1130.0 The third quartile is being correctly calculated in the fivenum command and incorrectly in the summary command. Bob +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Robert L. Schaefer, Professor of Statistics Department of Mathematics and Statistics Miami University Oxford, Ohio 45056 (513) 529-3533 (513) 529-5818 (sec) SchaefRL at MUOhio.Edu HTTP://WWW.USERS.MUOHIO.EDU/SchaefRL +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Tue, 9 Oct 2007, Schaefer, Robert L. Dr. wrote:> I've just started using R and am still a neophyte, but I found the > following curious result. I'm using the current version of R (2.5.1 > (2007-06-27) ). > > Why are the results for the third quartile different in the output from > the summary and fivenum commands?Because there are lots of ways to define quantiles. The quantile() function provides 9 definitions of quantiles, and in your data these range from 913 to 1015.75 You might also get a hint that there are deliberately different definitions involved from the help page for fivenum(), which doesn't use the word "quartile" at all. Everyone agrees that the third quartile (and the upper hinge) should be somewhere between the 24th and 25th of 32 observations, but not on which point in this interval should be chosen. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle