thr3ads.net - R help - [R] How to break data in quantiles properly? [Apr 2005]

If this information is useful, please help other people find it:
Share via:

Eric Rodriguez

2005-Apr-27 14:31 UTC

[R] How to break data in quantiles properly?

Hi,

I would like to break a dataset in n.classes quantiles.
Till now, I used the following code:
Classify.Quantile <- function (dataset, nclasses = 10) 
{
	n.probs <- seq(0,1,length=nclasses+1)
	n.labels = paste("C", 1:nclasses-1, sep="")
	n.rows <- nrow(dataset)
	n.cols <- ncol(dataset)
	n.motif <- dataset
	
	for (j in 2:n.cols)
	{
		cat(j, "  ");
		discr =
n.labels[unclass(cut(dataset[,j],quantile(dataset[,j],n.probs),include.lowest=T))]
		n.motif[,j] = discr
	}
	
	res <- list(motif=n.motif, labels=n.labels, n.classes=nclasses)
	return(res)
}


but if you try to call this with a dataset with a lot of same value, you got a 
Error in cut.default(dataset[, j], quantile(dataset[, j], n.probs),
include.lowest = T) :
        cut: breaks are not unique

I perfectly understand why but I would like to know how to avoid this behaviour.

for e.g., use this code to raise the error:
x=matrix(0,1000,1)
x[100]=1
Classify.Quantile(x, 10)

of course this dataset is a bit extreme but it happens to get data
with very small variance.


Thanks for any help you could provide

Frank E Harrell Jr

2005-Apr-27 16:12 UTC

head link

[R] How to break data in quantiles properly?

Eric Rodriguez wrote:> Hi,
> 
> I would like to break a dataset in n.classes quantiles.
> Till now, I used the following code:
> Classify.Quantile <- function (dataset, nclasses = 10) 
> {
> 	n.probs <- seq(0,1,length=nclasses+1)
> 	n.labels = paste("C", 1:nclasses-1, sep="")
> 	n.rows <- nrow(dataset)
> 	n.cols <- ncol(dataset)
> 	n.motif <- dataset
> 	
> 	for (j in 2:n.cols)
> 	{
> 		cat(j, "  ");
> 		discr =
n.labels[unclass(cut(dataset[,j],quantile(dataset[,j],n.probs),include.lowest=T))]
> 		n.motif[,j] = discr
> 	}
> 	
> 	res <- list(motif=n.motif, labels=n.labels, n.classes=nclasses)
> 	return(res)
> }
> 
> 
> but if you try to call this with a dataset with a lot of same value, you
got a
> Error in cut.default(dataset[, j], quantile(dataset[, j], n.probs),
> include.lowest = T) :
>         cut: breaks are not unique
> 
> I perfectly understand why but I would like to know how to avoid this
behaviour.
> 
> for e.g., use this code to raise the error:
> x=matrix(0,1000,1)
> x[100]=1
> Classify.Quantile(x, 10)
> 
> of course this dataset is a bit extreme but it happens to get data
> with very small variance.
> 
> 
> Thanks for any help you could provide
The cut2 function in the Hmisc package may help.  -FH

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

R help - Apr 2005 - How to break data in quantiles properly?

[R] How to break data in quantiles properly?

[R] How to break data in quantiles properly?