Dark
2013-Jul-31 09:37 UTC
[R] Add a column to a data frame with value based on the percentile of the row
Hi all, I think this should be an easy question for the guru's out here. I have this large data frame (2.500.000 rows, 15 columns) and I want to add a column named "SEGMENT" to it. The first 5% rows (first 125.000 rows) should have the value "Top 5%" in the SEGMENT column Then the rows from 5% to 20% should have the value "5 to 20" Then 20-50% should have the value "20 to 50" And the last 50% of the rows should have the value "Bottom 50" What is the easiest way of doing this? I was thinking of using quantile but then I should have some rownumber column. Regards Derk -- View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html Sent from the R help mailing list archive at Nabble.com.
Rui Barradas
2013-Jul-31 11:22 UTC
[R] Add a column to a data frame with value based on the percentile of the row
Hello, Combine quantile() with findInterval(). Something like the following. # sample data x <- rnorm(100) val <- c("Bottom 50", "20 to 50", "5 to 20", "Top 5%") qq <- quantile(x, probs = c(0, 0.50, 0.70, 0.95, 1)) idx <- findInterval(x, qq) val[idx] Hope this helps, Rui Barradas Em 31-07-2013 10:37, Dark escreveu:> Hi all, > > I think this should be an easy question for the guru's out here. > > I have this large data frame (2.500.000 rows, 15 columns) and I want to add > a column named "SEGMENT" to it. > The first 5% rows (first 125.000 rows) should have the value "Top 5%" in the > SEGMENT column > Then the rows from 5% to 20% should have the value "5 to 20" > Then 20-50% should have the value "20 to 50" > And the last 50% of the rows should have the value "Bottom 50" > > What is the easiest way of doing this? I was thinking of using quantile but > then I should have some rownumber column. > > Regards Derk > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
arun
2013-Jul-31 11:48 UTC
[R] Add a column to a data frame with value based on the percentile of the row
Hi, May be this helps: set.seed(24) dat1<- data.frame(ID=1:500,value=rnorm(500)) indx<-round(quantile(as.numeric(row.names(dat1)),probs=c(0.05,0.20,0.50,1))) indx1<-findInterval(row.names(dat1),indx,rightmost.closed=TRUE) dat1$SEGMENT<- as.character(factor(indx1,labels=c("Top 5%","5 to 20","20 to 50", "Bottom 50"))) head(dat1) #? ID????? value SEGMENT #1? 3 -0.7859574? Top 5% #2? 3? 1.0117428? Top 5% #3? 8 -2.1558035? Top 5% #4? 6? 1.7803880? Top 5% #5? 7? 0.4192816? Top 5% #6 10 -1.0142512? Top 5% ?tail(dat1) #??? ID????? value?? SEGMENT #495? 1? 0.3571848 Bottom 50 #496? 9 -1.1971854 Bottom 50 #497? 5? 0.3544896 Bottom 50 #498? 8 -0.1562356 Bottom 50 #499? 8 -0.2994321 Bottom 50 #500? 8 -0.4170319 Bottom 50 A.K. ----- Original Message ----- From: Dark <info at software-solutions.nl> To: r-help at r-project.org Cc: Sent: Wednesday, July 31, 2013 5:37 AM Subject: [R] Add a column to a data frame with value based on the percentile of the row Hi all, I think this should be an easy question for the guru's out here. I have this large data frame (2.500.000 rows, 15 columns) and I want to add a column named "SEGMENT" to it. The first 5% rows (first 125.000 rows) should have the value "Top 5%" in the SEGMENT column Then the rows from 5% to 20% should have the value "5 to 20" Then 20-50% should have the value "20 to 50" And the last 50% of the rows should have the value "Bottom 50" What is the easiest way of doing this? I was thinking of using quantile but then I should have some rownumber column. Regards Derk -- View this message in context: http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.