Hi all, Assume that I have the following 10 data points. x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) sort x and get the following y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) I want to group the sorted data point (y) into equal number of observation per group. In this case there will be three groups. The first two groups will have three observation and the third will have four observations group 1 = 34, 45, 46 group 2 = 66, 78, 125 group 3 = 193, 209, 242,297 Finally I want to calculate the group mean group 1 = 42 group 2 = 87 group 3 = 234 Can anyone help me out? In SAS I used to do it using proc rank. thanks in advance Val [[alternative HTML version deleted]]
On Apr 3, 2012, at 8:47 AM, Val wrote:> Hi all, > > Assume that I have the following 10 data points. > x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) > > sort x and get the following > y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297)The methods below do not require a sorting step.> > I want to group the sorted data point (y) into equal number of > observation per group. In this case there will be three groups. The > first > two groups will have three observation and the third will have four > observations > > group 1 = 34, 45, 46 > group 2 = 66, 78, 125 > group 3 = 193, 209, 242,297 > > Finally I want to calculate the group mean > > group 1 = 42 > group 2 = 87 > group 3 = 234I hope those weren't answers from SAS.> > Can anyone help me out? >I usually do this with Hmisc::cut2 since it has a `g = <n>` parameter that auto-magically calls the quantile splitting criterion but this is done in base R. split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ) $`[36,65.9]` [1] 36 45 46 $`(65.9,189]` [1] 66 78 125 $`(189,297]` [1] 193 209 242 297 > lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) ), mean) $`[36,65.9]` [1] 42.33333 $`(65.9,189]` [1] 89.66667 $`(189,297]` [1] 235.25 Or to get a table instead of a list: > tapply( x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) , include.lowest=TRUE) , mean) [36,65.9] (65.9,189] (189,297] 42.33333 89.66667 235.25000> In SAS I used to do it using proc rank.?quantile isn't equivalent to Proc Rank but it will provide a useful basis for splitting or tabling functions.> > thanks in advance > > Val > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Probably something along the following lines:> x <- c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) > sorted <- c(36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) > tapply(sorted, INDEX = (seq_along(sorted) - 1) %/% 3, FUN = mean)0 1 2 3 42.33333 89.66667 214.66667 297.00000 Hope this helps, Giovanni On Tue, 2012-04-03 at 08:47 -0400, Val wrote:> Hi all, > > Assume that I have the following 10 data points. > x=c( 46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) > > sort x and get the following > y= (36 , 45 , 46, 66, 78, 125,193, 209, 242, 297) > > I want to group the sorted data point (y) into equal number of > observation per group. In this case there will be three groups. The first > two groups will have three observation and the third will have four > observations > > group 1 = 34, 45, 46 > group 2 = 66, 78, 125 > group 3 = 193, 209, 242,297 > > Finally I want to calculate the group mean > > group 1 = 42 > group 2 = 87 > group 3 = 234 > > Can anyone help me out? > > In SAS I used to do it using proc rank. > > thanks in advance > > Val > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Giovanni Petris <GPetris at uark.edu> Associate Professor Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/
Ignoring the fact your desired answers are wrong, I'd split the separating part and the group means parts into three steps: i) quantile() can help you get the split points, ii) findInterval() can assign each y to a group iii) then ave() or tapply() will do group-wise means Something like: y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c" here. ave(y, findInterval(y, quantile(y, c(0.33, 0.66)))) tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean) You could also use cut2 from the Hmisc package to combine findInterval and quantile into a single step. Depending on your desired output. Hope that helps, Michael On Tue, Apr 3, 2012 at 8:47 AM, Val <valkremk at gmail.com> wrote:> Hi all, > > Assume that I have the following 10 data points. > ?x=c( ?46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45) > > sort x ?and get the following > ?y= (36 , 45 , 46, ?66, 78, ?125,193, 209, 242, 297) > > I want to ?group the sorted ?data point (y) ?into ?equal number of > observation per group. In this case there will be three groups. ?The first > two groups ?will have three observation ?and the third will have four > observations > > group 1 ?= 34, 45, 46 > group 2 ?= 66, 78, 125 > group 3 ?= 193, 209, 242,297 > > Finally I want to calculate the group mean > > group 1 ?= ?42 > group 2 ?= ?87 > group 3 ?= ?234 > > Can anyone help me out? > > In SAS I used to do it using proc rank. > > thanks in advance > > Val > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi! Maybe not the most elegant solution, but works: for(i in seq(1,length(data)-(length(data) %% 3), 3)) { ifelse((length(data)-i)>3, { print(sort(data)[ c(i:(i+2)) ]); print(mean(sort(data)[ c(i:(i+2)) ])) }, { print(sort(data)[ c(i:length(data)) ]); print(mean(sort(data)[ c(i:length(data)) ])) } ) } Produces: [1] 36 45 46 [1] 42.33333 [1] 66 78 125 [1] 89.66667 [1] 193 209 242 297 [1] 235.25 HTH, Kimmo