thr3ads.net - R help - [R] grouping [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Val

2012-Apr-03 12:47 UTC

[R] grouping

Hi all,

Assume that I have the following 10 data points.
 x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

sort x  and get the following
  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

I want to  group the sorted  data point (y)  into  equal number of
observation per group. In this case there will be three groups.  The first
two groups  will have three observation  and the third will have four
observations

group 1  = 34, 45, 46
group 2  = 66, 78, 125
group 3  = 193, 209, 242,297

Finally I want to calculate the group mean

group 1  =  42
group 2  =  87
group 3  =  234

Can anyone help me out?

In SAS I used to do it using proc rank.

thanks in advance

Val

	[[alternative HTML version deleted]]

David Winsemius

2012-Apr-03 13:10 UTC

head link

[R] grouping

On Apr 3, 2012, at 8:47 AM, Val wrote:
> Hi all,
>
> Assume that I have the following 10 data points.
> x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x  and get the following
>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
The methods below do not require a sorting step.
>
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The  
> first
> two groups  will have three observation  and the third will have four
> observations
>
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1  =  42
> group 2  =  87
> group 3  =  234
I hope those weren't answers from SAS.
>
> Can anyone help me out?
>
I usually do this with Hmisc::cut2 since it has a `g = <n>` parameter  
that auto-magically calls the quantile splitting criterion but this is  
done in base R.

split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) )
$`[36,65.9]`
[1] 36 45 46

$`(65.9,189]`
[1]  66  78 125

$`(189,297]`
[1] 193 209 242 297


 > lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) ), mean)
$`[36,65.9]`
[1] 42.33333

$`(65.9,189]`
[1] 89.66667

$`(189,297]`
[1] 235.25

Or to get a table instead of a list:
 > tapply( x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) , mean)
  [36,65.9] (65.9,189]  (189,297]
   42.33333   89.66667  235.25000
> In SAS I used to do it using proc rank.
?quantile isn't equivalent to  Proc Rank but it will provide a useful  
basis for splitting or tabling functions.
>
> thanks in advance
>
> Val
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Giovanni Petris

2012-Apr-03 13:13 UTC

head link

[R] grouping

Probably something along the following lines:
> x <- c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
> sorted <- c(36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
> tapply(sorted, INDEX = (seq_along(sorted) - 1) %/% 3, FUN = mean)        0         1         2         3 
 42.33333  89.66667 214.66667 297.00000 

Hope this helps,
Giovanni

On Tue, 2012-04-03 at 08:47 -0400, Val wrote:> Hi all,
> 
> Assume that I have the following 10 data points.
>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
> 
> sort x  and get the following
>   y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
> 
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The first
> two groups  will have three observation  and the third will have four
> observations
> 
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
> 
> Finally I want to calculate the group mean
> 
> group 1  =  42
> group 2  =  87
> group 3  =  234
> 
> Can anyone help me out?
> 
> In SAS I used to do it using proc rank.
> 
> thanks in advance
> 
> Val
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 

Giovanni Petris  <GPetris at uark.edu>
Associate Professor
Department of Mathematical Sciences
University of Arkansas - Fayetteville, AR 72701
Ph: (479) 575-6324, 575-8630 (fax)
http://definetti.uark.edu/~gpetris/

R. Michael Weylandt

2012-Apr-03 13:13 UTC

head link

[R] grouping

Ignoring the fact your desired answers are wrong, I'd split the
separating part and the group means parts into three steps:

i) quantile() can help you get the split points,
ii)  findInterval() can assign each y to a group
iii) then ave() or tapply() will do group-wise means

Something like:

y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a
"c" here.
ave(y, findInterval(y, quantile(y, c(0.33, 0.66))))
tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

You could also use cut2 from the Hmisc package to combine findInterval
and quantile into a single step.

Depending on your desired output.

Hope that helps,
Michael

On Tue, Apr 3, 2012 at 8:47 AM, Val <valkremk at gmail.com>
wrote:> Hi all,
>
> Assume that I have the following 10 data points.
> ?x=c( ?46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x ?and get the following
> ?y= (36 , 45 , 46, ?66, 78, ?125,193, 209, 242, 297)
>
> I want to ?group the sorted ?data point (y) ?into ?equal number of
> observation per group. In this case there will be three groups. ?The first
> two groups ?will have three observation ?and the third will have four
> observations
>
> group 1 ?= 34, 45, 46
> group 2 ?= 66, 78, 125
> group 3 ?= 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1 ?= ?42
> group 2 ?= ?87
> group 3 ?= ?234
>
> Can anyone help me out?
>
> In SAS I used to do it using proc rank.
>
> thanks in advance
>
> Val
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

K. Elo

2012-Apr-03 13:28 UTC

head link

[R] grouping

Hi!

Maybe not the most elegant solution, but works:

for(i in seq(1,length(data)-(length(data) %% 3), 3)) { 
ifelse((length(data)-i)>3, { print(sort(data)[ c(i:(i+2)) ]); 
print(mean(sort(data)[ c(i:(i+2)) ])) }, { print(sort(data)[ 
c(i:length(data)) ]); print(mean(sort(data)[ c(i:length(data)) ])) } ) }

Produces:

[1] 36 45 46
[1] 42.33333
[1]  66  78 125
[1] 89.66667
[1] 193 209 242 297
[1] 235.25

HTH,
Kimmo

Seemingly Similar Threads

Search for more reasonably related threads

R help - Apr 2012 - grouping

[R] grouping

[R] grouping

[R] grouping

[R] grouping

[R] grouping

Seemingly Similar Threads