Dear R-List, I would like to recode my data according to quantile breaks, i.e. all data within the range of 0%-25% should get a 1, >25%-50% a 2 etc. Is there a nice way to do this with all columns in a dataframe. e.g. df<- f<-data.frame(id=c("x01","x02","x03","x04","x05","x06"),a=c(1,2,3,4,5,6),b=c(2,4,6,8,10,12),c=c(1,3,9,12,15,18)) df id a b c 1 x01 1 2 1 2 x02 2 4 3 3 x03 3 6 9 4 x04 4 8 12 5 x05 5 10 15 6 x06 6 12 18 #I can do it in very complicated way apply(df[-1],2,quantile) a b c 0% 1.0 2.0 1.0 25% 2.2 4.5 4.5 50% 3.5 7.0 10.5 75% 4.8 9.5 14.2 100% 6.0 12.0 18.0 #then df$a[df$a<=2.2]<-1 ... #result should be df.breaks id a b c x01 1 1 1 x02 1 1 1 x03 2 2 2 x04 3 3 3 x05 4 4 4 x06 4 4 4 But there must be a way to do it more elegantly, something like df.breaks<- apply(df[-1],2,recode.by.quantile) Can anyone help me with this? Best wishes Alain [[alternative HTML version deleted]]
Hi Alain, The following should get you started: apply(df[,-1], 2, function(x) cut(x, breaks = quantile(x), include.lowest TRUE, labels = 1:4)) Check ?cut and ?apply for more information. HTH, Jorge.- On Tue, Feb 19, 2013 at 9:01 PM, D. Alain <> wrote:> Dear R-List, > > I would like to recode my data according to quantile breaks, i.e. all data > within the range of 0%-25% should get a 1, >25%-50% a 2 etc. > Is there a nice way to do this with all columns in a dataframe. > > e.g. > > df<- > f<-data.frame(id=c("x01","x02","x03","x04","x05","x06"),a=c(1,2,3,4,5,6),b=c(2,4,6,8,10,12),c=c(1,3,9,12,15,18)) > > df > id a b c > 1 x01 1 2 1 > 2 x02 2 4 3 > 3 x03 3 6 9 > 4 x04 4 8 12 > 5 x05 5 10 15 > 6 x06 6 12 18 > > #I can do it in very complicated way > > > apply(df[-1],2,quantile) > a b c > 0% 1.0 2.0 1.0 > 25% 2.2 4.5 4.5 > 50% 3.5 7.0 10.5 > 75% 4.8 9.5 14.2 > 100% 6.0 12.0 18.0 > > #then > > df$a[df$a<=2.2]<-1 > ... > > #result should be > > > df.breaks > > id a b c > x01 1 1 1 > x02 1 1 1 > x03 2 2 2 > x04 3 3 3 > x05 4 4 4 > x06 4 4 4 > > > > But there must be a way to do it more elegantly, something like > > > df.breaks<- apply(df[-1],2,recode.by.quantile) > > Can anyone help me with this? > > > Best wishes > > > Alain > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
HI Alain, Try this: df.breaks<-data.frame(id=df[,1],sapply(df[,-1],function(x) findInterval(x,quantile(x),rightmost.closed=TRUE)),stringsAsFactors=FALSE) df.breaks #?? id a b c #1 x01 1 1 1 #2 x02 1 1 1 #3 x03 2 2 2 #4 x04 3 3 3 #5 x05 4 4 4 #6 x06 4 4 4 A.K. ----- Original Message ----- From: D. Alain <dialvac-r at yahoo.de> To: Mailinglist R-Project <r-help at r-project.org> Cc: Sent: Tuesday, February 19, 2013 5:01 AM Subject: [R] recode data according to quantile breaks Dear R-List, I would like to recode my data according to quantile breaks, i.e. all data within the range of 0%-25% should get a 1, >25%-50% a 2 etc. Is there a nice way to do this with all columns in a dataframe. e.g. df<- f<-data.frame(id=c("x01","x02","x03","x04","x05","x06"),a=c(1,2,3,4,5,6),b=c(2,4,6,8,10,12),c=c(1,3,9,12,15,18)) df ?? id ???????a? ??? b? ??? c 1 x01 ??? 1????? 2????? 1 2 x02???? 2? ??? 4? ??? 3 3 x03???? 3? ??? 6? ??? 9 4 x04 ??? 4? ??? 8???? 12 5 x05 ??? 5 ??? 10???? 15 6 x06 ??? 6 ??? 12 ??? 18 #I can do it in very complicated way apply(df[-1],2,quantile) ?????? a??? b??? c 0%?? 1.0? 2.0? 1.0 25%? 2.2? 4.5? 4.5 50%? 3.5? 7.0 10.5 75%? 4.8? 9.5 14.2 100% 6.0 12.0 18.0 #then df$a[df$a<=2.2]<-1 ... #result should be df.breaks id??? ??? a??? ??? b??? ??? c x01??? 1??? ??? ?? 1??? ??? 1 x02??? 1????? ??? 1??? ??? 1 x03??? 2??? ??? ?? 2??? ??? 2 x04??? 3?????????? 3??????? 3 x05??? 4?????????? 4??????? 4 x06??? 4?????????? 4??????? 4? But there must be a way to do it more elegantly, something like df.breaks<- apply(df[-1],2,recode.by.quantile) Can anyone help me with this? Best wishes Alain? ??? ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Reasonably Related Threads
- weighing proportion of rowSums in dataframe
- FW: ERR_DS_DRA_SCHEMA_MISMATCH after join samba 4.2.1 to existing domain
- replication fails
- Probabilities outside [0, 1] using Support Vector Machines (SVM) in e1071
- virt-resize Fatal error: exception Guestfs.Error("e2fsck_f