Ram H. Sharma
2011-Mar-13 15:49 UTC
[R] replace with quantile value for a large data frame...
Dear R-Experts I am sure this might look simple question for experts, at least is problem for me. I have a large data frame with over 1000 variables and each have different distribution( i.e. have different quantile). I want to create a new grouped data frame, where the new variables where the value falling in first (<25%), second (25% to <50%), third (50% to <75%) and fourth quantiles (>75%) are replaced with 1,2,3, 4 respectively. The following example is just to workout. # my example: X1 <- c(1:10)> X2 <- c(11:20) > X3 <- c(21:30) > X4 <- c(31:40) > X5 <- c(41:50) > dataf <- data.frame(X1, X2, X3, X4, X5) >> # my efforts of the last week led me to this point >for (i along(length(dataf[1,]))) {> qntfun <- function (x) { > XQ <- as.numeric(as.matrix(quantile(x))) > Q1 <- XQ[1] > Q2 <- XQ[2] > Q3 <- XQ[3] > Q4 <- XQ[4] > for (i in 1:length(x)){ > if (x[i] < Q2) { > x[i] <- 1 > } else { > if ( x[i] > Q2 & x[i] < Q3){ > x[i] <- 2 > } else { > if ( x[i] >Q3 & x[i] <Q4) { > x[i] <- 3 > } else { > if (x[i] > Q4) { > x[i] <- 4 > } else{ > x[i] <- 0 > } > } > } > } > } > } > apply(dataf, 1:length(dataf), qntfun) > } ># I got error, I can not fix it. I would be glad to see a more slim solution, but I could not think any. Thanks in advance for your help. Ram Sharma [[alternative HTML version deleted]]
Dimitris Rizopoulos
2011-Mar-13 16:55 UTC
[R] replace with quantile value for a large data frame...
one way is the following: X1 <- c(1:10) X2 <- c(11:20) X3 <- c(21:30) X4 <- c(31:40) X5 <- c(41:50) DF <- data.frame(X1, X2, X3, X4, X5) as.data.frame(sapply(DF, function (x) { qx <- quantile(x) cut(x, qx, include.lowest = TRUE, labels = 1:4) })) You may also have a look at function cut2() from package Hmisc. I hope it helps. Best, Dimitris On 3/13/2011 4:49 PM, Ram H. Sharma wrote:> Dear R-Experts > > I am sure this might look simple question for experts, at least is problem > for me. I have a large data frame with over 1000 variables and each have > different distribution( i.e. have different quantile). I want to create a > new grouped data frame, where the new variables where the value falling in > first (<25%), second (25% to<50%), third (50% to<75%) and fourth quantiles > (>75%) are replaced with 1,2,3, 4 respectively. The following example is > just to workout. > # my example: > X1<- c(1:10) > >> X2<- c(11:20) >> X3<- c(21:30) >> X4<- c(31:40) >> X5<- c(41:50) >> dataf<- data.frame(X1, X2, X3, X4, X5) >> > > > >> # my efforts of the last week led me to this point >> > for (i along(length(dataf[1,]))) { > >> qntfun<- function (x) { >> XQ<- as.numeric(as.matrix(quantile(x))) >> Q1<- XQ[1] >> Q2<- XQ[2] >> Q3<- XQ[3] >> Q4<- XQ[4] >> for (i in 1:length(x)){ >> if (x[i]< Q2) { >> x[i]<- 1 >> } else { >> if ( x[i]> Q2& x[i]< Q3){ >> x[i]<- 2 >> } else { >> if ( x[i]>Q3& x[i]<Q4) { >> x[i]<- 3 >> } else { >> if (x[i]> Q4) { >> x[i]<- 4 >> } else{ >> x[i]<- 0 >> } >> } >> } >> } >> } >> } >> apply(dataf, 1:length(dataf), qntfun) >> } >> > # I got error, I can not fix it. I would be glad to see a more slim > solution, but I could not think any. > > Thanks in advance for your help. > > Ram Sharma > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/