thr3ads.net - R help - [R] replace with quantile value for a large data frame... [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Ram H. Sharma

2011-Mar-13 15:49 UTC

[R] replace with quantile value for a large data frame...

Dear R-Experts

I am sure this might look simple question for experts, at least is problem
for me. I have a large data frame with over 1000 variables and each have
different distribution( i.e. have different quantile). I want to create a
new grouped data frame, where the new variables where the value falling in
first (<25%), second (25% to <50%), third (50% to <75%) and fourth
quantiles
(>75%) are replaced with 1,2,3, 4 respectively. The following example is
just to workout.
# my example:
   X1 <- c(1:10)
> X2 <- c(11:20)
> X3 <- c(21:30)
> X4 <- c(31:40)
> X5 <- c(41:50)
> dataf <- data.frame(X1, X2, X3, X4, X5)
>

> # my efforts of the last week led me to this point
>for (i along(length(dataf[1,]))) {
> qntfun <- function (x) {
>   XQ <- as.numeric(as.matrix(quantile(x)))
>   Q1 <- XQ[1]
>   Q2 <- XQ[2]
>   Q3 <- XQ[3]
>   Q4 <- XQ[4]
>   for (i in 1:length(x)){
>   if (x[i] < Q2) {
>               x[i] <- 1
>               } else {
>               if ( x[i] > Q2 & x[i] < Q3){
>               x[i] <- 2
>               }   else {
>               if ( x[i] >Q3 & x[i] <Q4) {
>               x[i] <- 3
>               } else {
>               if (x[i] > Q4) {
>               x[i] <- 4
>               } else{
>                x[i] <- 0
>               }
>               }
>               }
>               }
>               }
>               }
> apply(dataf, 1:length(dataf), qntfun)
>  }
># I got error, I can not fix it. I would be glad to see a more slim
solution, but I could not think any.

Thanks in advance for your help.

Ram Sharma

	[[alternative HTML version deleted]]

Dimitris Rizopoulos

2011-Mar-13 16:55 UTC

head link

[R] replace with quantile value for a large data frame...

one way is the following:

X1 <- c(1:10)
X2 <- c(11:20)
X3 <- c(21:30)
X4 <- c(31:40)
X5 <- c(41:50)
DF <- data.frame(X1, X2, X3, X4, X5)

as.data.frame(sapply(DF, function (x) {
     qx <- quantile(x)
     cut(x, qx, include.lowest = TRUE,
         labels = 1:4)
}))


You may also have a look at function cut2() from package Hmisc.


I hope it helps.

Best,
Dimitris


On 3/13/2011 4:49 PM, Ram H. Sharma wrote:> Dear R-Experts
>
> I am sure this might look simple question for experts, at least is problem
> for me. I have a large data frame with over 1000 variables and each have
> different distribution( i.e. have different quantile). I want to create a
> new grouped data frame, where the new variables where the value falling in
> first (<25%), second (25% to<50%), third (50% to<75%) and fourth
quantiles
> (>75%) are replaced with 1,2,3, 4 respectively. The following example is
> just to workout.
> # my example:
>     X1<- c(1:10)
>
>> X2<- c(11:20)
>> X3<- c(21:30)
>> X4<- c(31:40)
>> X5<- c(41:50)
>> dataf<- data.frame(X1, X2, X3, X4, X5)
>>
>
>
>
>> # my efforts of the last week led me to this point
>>
> for (i along(length(dataf[1,]))) {
>
>> qntfun<- function (x) {
>>    XQ<- as.numeric(as.matrix(quantile(x)))
>>    Q1<- XQ[1]
>>    Q2<- XQ[2]
>>    Q3<- XQ[3]
>>    Q4<- XQ[4]
>>    for (i in 1:length(x)){
>>    if (x[i]<  Q2) {
>>                x[i]<- 1
>>                } else {
>>                if ( x[i]>  Q2&  x[i]<  Q3){
>>                x[i]<- 2
>>                }   else {
>>                if ( x[i]>Q3&  x[i]<Q4) {
>>                x[i]<- 3
>>                } else {
>>                if (x[i]>  Q4) {
>>                x[i]<- 4
>>                } else{
>>                 x[i]<- 0
>>                }
>>                }
>>                }
>>                }
>>                }
>>                }
>> apply(dataf, 1:length(dataf), qntfun)
>>   }
>>
> # I got error, I can not fix it. I would be glad to see a more slim
> solution, but I could not think any.
>
> Thanks in advance for your help.
>
> Ram Sharma
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

Possibly Parallel Threads

Search for more reasonably related threads

R help - Mar 2011 - replace with quantile value for a large data frame...

[R] replace with quantile value for a large data frame...

[R] replace with quantile value for a large data frame...

Possibly Parallel Threads