Dear R-List, 
I would like to recode my data according to quantile breaks, i.e. all data
within the range of 0%-25% should get a 1, >25%-50% a 2 etc.
Is there a nice way to do this with all columns in a dataframe.
e.g.
df<-
f<-data.frame(id=c("x01","x02","x03","x04","x05","x06"),a=c(1,2,3,4,5,6),b=c(2,4,6,8,10,12),c=c(1,3,9,12,15,18))
df
   id        a      b      c
1 x01     1      2      1
2 x02     2      4      3
3 x03     3      6      9
4 x04     4      8     12
5 x05     5     10     15
6 x06     6     12     18
#I can do it in very complicated way
apply(df[-1],2,quantile)
       a    b    c
0%   1.0  2.0  1.0
25%  2.2  4.5  4.5
50%  3.5  7.0 10.5
75%  4.8  9.5 14.2
100% 6.0 12.0 18.0
#then 
df$a[df$a<=2.2]<-1
...
#result should be
df.breaks
id        a        b        c
x01    1           1        1
x02    1          1        1
x03    2           2        2
x04    3           3        3
x05    4           4        4
x06    4           4        4 
But there must be a way to do it more elegantly, something like
df.breaks<- apply(df[-1],2,recode.by.quantile)
Can anyone help me with this?
Best wishes 
Alain      
	[[alternative HTML version deleted]]
Hi Alain, The following should get you started: apply(df[,-1], 2, function(x) cut(x, breaks = quantile(x), include.lowest TRUE, labels = 1:4)) Check ?cut and ?apply for more information. HTH, Jorge.- On Tue, Feb 19, 2013 at 9:01 PM, D. Alain <> wrote:> Dear R-List, > > I would like to recode my data according to quantile breaks, i.e. all data > within the range of 0%-25% should get a 1, >25%-50% a 2 etc. > Is there a nice way to do this with all columns in a dataframe. > > e.g. > > df<- > f<-data.frame(id=c("x01","x02","x03","x04","x05","x06"),a=c(1,2,3,4,5,6),b=c(2,4,6,8,10,12),c=c(1,3,9,12,15,18)) > > df > id a b c > 1 x01 1 2 1 > 2 x02 2 4 3 > 3 x03 3 6 9 > 4 x04 4 8 12 > 5 x05 5 10 15 > 6 x06 6 12 18 > > #I can do it in very complicated way > > > apply(df[-1],2,quantile) > a b c > 0% 1.0 2.0 1.0 > 25% 2.2 4.5 4.5 > 50% 3.5 7.0 10.5 > 75% 4.8 9.5 14.2 > 100% 6.0 12.0 18.0 > > #then > > df$a[df$a<=2.2]<-1 > ... > > #result should be > > > df.breaks > > id a b c > x01 1 1 1 > x02 1 1 1 > x03 2 2 2 > x04 3 3 3 > x05 4 4 4 > x06 4 4 4 > > > > But there must be a way to do it more elegantly, something like > > > df.breaks<- apply(df[-1],2,recode.by.quantile) > > Can anyone help me with this? > > > Best wishes > > > Alain > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
HI Alain,
Try this:
df.breaks<-data.frame(id=df[,1],sapply(df[,-1],function(x)
findInterval(x,quantile(x),rightmost.closed=TRUE)),stringsAsFactors=FALSE)
df.breaks
#?? id a b c
#1 x01 1 1 1
#2 x02 1 1 1
#3 x03 2 2 2
#4 x04 3 3 3
#5 x05 4 4 4
#6 x06 4 4 4
A.K.
----- Original Message -----
From: D. Alain <dialvac-r at yahoo.de>
To: Mailinglist R-Project <r-help at r-project.org>
Cc: 
Sent: Tuesday, February 19, 2013 5:01 AM
Subject: [R] recode data according to quantile breaks
Dear R-List, 
I would like to recode my data according to quantile breaks, i.e. all data
within the range of 0%-25% should get a 1, >25%-50% a 2 etc.
Is there a nice way to do this with all columns in a dataframe.
e.g.
df<-
f<-data.frame(id=c("x01","x02","x03","x04","x05","x06"),a=c(1,2,3,4,5,6),b=c(2,4,6,8,10,12),c=c(1,3,9,12,15,18))
df
?? id ???????a? ??? b? ??? c
1 x01 ??? 1????? 2????? 1
2 x02???? 2? ??? 4? ??? 3
3 x03???? 3? ??? 6? ??? 9
4 x04 ??? 4? ??? 8???? 12
5 x05 ??? 5 ??? 10???? 15
6 x06 ??? 6 ??? 12 ??? 18
#I can do it in very complicated way
apply(df[-1],2,quantile)
?????? a??? b??? c
0%?? 1.0? 2.0? 1.0
25%? 2.2? 4.5? 4.5
50%? 3.5? 7.0 10.5
75%? 4.8? 9.5 14.2
100% 6.0 12.0 18.0
#then 
df$a[df$a<=2.2]<-1
...
#result should be
df.breaks
id??? ??? a??? ??? b??? ??? c
x01??? 1??? ??? ?? 1??? ??? 1
x02??? 1????? ??? 1??? ??? 1
x03??? 2??? ??? ?? 2??? ??? 2
x04??? 3?????????? 3??????? 3
x05??? 4?????????? 4??????? 4
x06??? 4?????????? 4??????? 4?
But there must be a way to do it more elegantly, something like
df.breaks<- apply(df[-1],2,recode.by.quantile)
Can anyone help me with this?
Best wishes 
Alain? ??? 
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Possibly Parallel Threads
- weighing proportion of rowSums in dataframe
- FW: ERR_DS_DRA_SCHEMA_MISMATCH after join samba 4.2.1 to existing domain
- replication fails
- Probabilities outside [0, 1] using Support Vector Machines (SVM) in e1071
- virt-resize Fatal error: exception Guestfs.Error("e2fsck_f