thr3ads.net - R help - [R] quantile function [Feb 2004]

If this information is useful, please help other people find it:
Share via:

Giovanni Petris

2004-Feb-06 15:30 UTC

[R] quantile function

I am trying to `cut' a continuous variable into contiguous classes
containing approximately an equal number of observations. I thought
quantile() was the appropriate function to use in order to find the
breakpoints, but I end up with classes of different sizes - see
example below. Does anybody have an explanation for that? And what is
the `recommended' way of computing what I am looking for?

Example:
> ca$age [1] 28 42 46 45 34 44 48 45 38 45 49 45 41 46 49 46 44 48 52 48 45 50 53 57 46
[26] 52 54 57 47 52 55 59 50 54 57 60 51 55 46 63 51 59 48 35 53 59 57 37 55 32
[51] 60 43 59 37 30 47 60 38 34 48 32 38 36 49 33 42 38 58 35 43 39 59 39 43 42
[76] 60 40 44> table(cut(ca$age,breaks=c(-Inf,quantile(ca$age, seq(0,1,length=11)[-1]))))
(-Inf,35] (35,38.4] (38.4,43]   (43,45] (45,46.5] (46.5,49]   (49,52]   (52,55] 
        9         7        10         8         5        10         7         7 
  (55,59]   (59,63] 
       10         5 

Thanks in advance,
Giovanni

-- 

 __________________________________________________
[                                                  ]
[ Giovanni Petris                 GPetris at uark.edu ]
[ Department of Mathematical Sciences              ]
[ University of Arkansas - Fayetteville, AR 72701  ]
[ Ph: (479) 575-6324, 575-8630 (fax)               ]
[ http://definetti.uark.edu/~gpetris/              ]
[__________________________________________________]

Thomas Lumley

2004-Feb-06 16:04 UTC

head link

[R] quantile function

On Fri, 6 Feb 2004, Giovanni Petris wrote:
>
> I am trying to `cut' a continuous variable into contiguous classes
> containing approximately an equal number of observations. I thought
> quantile() was the appropriate function to use in order to find the
> breakpoints, but I end up with classes of different sizes - see
> example below. Does anybody have an explanation for that? And what is
> the `recommended' way of computing what I am looking for?
Your variable is actually quite discrete, which is causing the problem.
For example, you have two 35s, so the lower groups could only be equal if one
35 was in one group and the other in the other group.

Now, if you want the groups to be equal even at the cost of not depending
just on the value there are at least two possible approaches
 - break ties randomly, for example by jitter()ing the data first
 - order the data by age and then take the first 8, next 8, and so on.

	-thomas

> Example:
>
> > ca$age
>  [1] 28 42 46 45 34 44 48 45 38 45 49 45 41 46 49 46 44 48 52 48 45 50
> 53 57 46  52 54 57 47 52 55 59 50 54 57 60 51 55 46 63 51 59 48 35
> 53 59 57 37 55 32  60 43 59 37 30 47 60 38 34 48 32 38 36 49 33 42
> 38 58 35 43 39 59 39 43 42  60 40 44
> > table(cut(ca$age,breaks=c(-Inf,quantile(ca$age,
seq(0,1,length=11)[-1]))))
>
> (-Inf,35] (35,38.4] (38.4,43]   (43,45] (45,46.5] (46.5,49]   (49,52]  
(52,55]
>         9         7        10         8         5        10         7      
7
>   (55,59]   (59,63]
>        10         5
>
> Thanks in advance,
> Giovanni
>
> --
>
>  __________________________________________________
> [                                                  ]
> [ Giovanni Petris                 GPetris at uark.edu ]
> [ Department of Mathematical Sciences              ]
> [ University of Arkansas - Fayetteville, AR 72701  ]
> [ Ph: (479) 575-6324, 575-8630 (fax)               ]
> [ http://definetti.uark.edu/~gpetris/              ]
> [__________________________________________________]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

Frank E Harrell Jr

2004-Feb-06 17:16 UTC

head link

[R] quantile function

On Fri, 6 Feb 2004 09:30:31 -0600 (CST)
Giovanni Petris <GPetris at uark.edu> wrote:
> 
> I am trying to `cut' a continuous variable into contiguous classes
> containing approximately an equal number of observations. I thought
> quantile() was the appropriate function to use in order to find the
> breakpoints, but I end up with classes of different sizes - see
> example below. Does anybody have an explanation for that? And what is
> the `recommended' way of computing what I am looking for?
> 
> Example:
> 
> > ca$age
>  [1] 28 42 46 45 34 44 48 45 38 45 49 45 41 46 49 46 44 48 52 48 45 50
>  53 57 46
> [26] 52 54 57 47 52 55 59 50 54 57 60 51 55 46 63 51 59 48 35 53 59 57
> 37 55 32[51] 60 43 59 37 30 47 60 38 34 48 32 38 36 49 33 42 38 58 35 43
> 39 59 39 43 42[76] 60 40 44
> > table(cut(ca$age,breaks=c(-Inf,quantile(ca$age,
> > seq(0,1,length=11)[-1]))))
> 
> (-Inf,35] (35,38.4] (38.4,43]   (43,45] (45,46.5] (46.5,49]   (49,52]  
> (52,55] 
>         9         7        10         8         5        10         7   
>              7 
>   (55,59]   (59,63] 
>        10         5 
> 
> Thanks in advance,
> Giovanni
> 
> -- 
> 
The cut2 function in the Hmisc package tries to do this the best it can.

Frank

---
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

Knut M. Wittkowski

2004-Feb-06 17:19 UTC

head link

[R] quantile function

Another problem with the R function "quantile" is that its definition
of
"quantiles" may be not what you expect. Consider the following:

 > x <- matrix(c(1:4))
 > quantile(x,c(0,.25,.5,.75,1))
   0%  25%  50%  75% 100%
1.00 1.75 2.50 3.25 4.00

 > x <- matrix(c(1:6))
 > quantile(x,c(0,.25,.5,.75,1))
   0%  25%  50%  75% 100%
1.00 2.25 3.50 4.75 6.00

 > x <- matrix(c(1:8))
 > quantile(x,c(0,.25,.5,.75,1))
   0%  25%  50%  75% 100%
1.00 2.75 4.50 6.25 8.00

With your implicit definition of quantiles (splitting the data set into 
classes of equal size), each class should have 1.5 observations, so that 
the quantiles should be

 > x <- matrix(c(1:4))
 > equalSizeClasses(x,c(0,.25,.5,.75,1))
   0%  25%  50%  75% 100%
-Inf  1.50 2.50 3.50 +Inf

 > x <- matrix(c(1:6))
 > equalSizeClasses(x,c(0,.25,.5,.75,1))
   0%  25%  50%  75% 100%
-Inf  2.00 3.50 5.00 +Inf

 > x <- matrix(c(1:8))
 > equalSizeClasses(x,c(0,.25,.5,.75,1))
   0%  25%  50%  75% 100%
-Inf  2.50 4.50 6.50 +Inf

Knut

At 09:30 2004-02-06 -0600, Giovanni Petris wrote:
>I am trying to `cut' a continuous variable into contiguous classes
>containing approximately an equal number of observations. I thought
>quantile() was the appropriate function to use in order to find the
>breakpoints, but I end up with classes of different sizes - see
>example below. Does anybody have an explanation for that? And what is
>the `recommended' way of computing what I am looking for?
>
>Example:
>
> > ca$age
>  [1] 28 42 46 45 34 44 48 45 38 45 49 45 41 46 49 46 44 48 52 48 45 50 53 
> 57 46
>[26] 52 54 57 47 52 55 59 50 54 57 60 51 55 46 63 51 59 48 35 53 59 57 37 
>55 32
>[51] 60 43 59 37 30 47 60 38 34 48 32 38 36 49 33 42 38 58 35 43 39 59 39 
>43 42
>[76] 60 40 44
> > table(cut(ca$age,breaks=c(-Inf,quantile(ca$age,
seq(0,1,length=11)[-1]))))
>
>(-Inf,35] (35,38.4] (38.4,43]   (43,45] (45,46.5] 
>(46.5,49]   (49,52]   (52,55]
>         9         7        10         8         5        10         7 
>      7
>   (55,59]   (59,63]
>        10         5
>
>Thanks in advance,
>Giovanni
>
>--
>
>  __________________________________________________
>[                                                  ]
>[ Giovanni Petris                 GPetris at uark.edu ]
>[ Department of Mathematical Sciences              ]
>[ University of Arkansas - Fayetteville, AR 72701  ]
>[ Ph: (479) 575-6324, 575-8630 (fax)               ]
>[ http://definetti.uark.edu/~gpetris/              ]
>[__________________________________________________]
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
Knut M. Wittkowski, PhD,DSc
------------------------------------------
The Rockefeller University, GCRC
Experimental Design and Biostatistics
1230 York Ave #121B, Box 322, NY,NY 10021
+1(212)327-7175, +1(212)327-8450 (Fax)
kmw at rockefeller.edu
http://www.rucares.org/clinicalresearch/dept/biometry/

Maybe Matching Threads

Search for more possibly parallel threads

R help - Feb 2004 - quantile function

[R] quantile function

[R] quantile function

[R] quantile function

[R] quantile function

Maybe Matching Threads