Hi, I would like to break a dataset in n.classes quantiles. Till now, I used the following code: Classify.Quantile <- function (dataset, nclasses = 10) { n.probs <- seq(0,1,length=nclasses+1) n.labels = paste("C", 1:nclasses-1, sep="") n.rows <- nrow(dataset) n.cols <- ncol(dataset) n.motif <- dataset for (j in 2:n.cols) { cat(j, " "); discr = n.labels[unclass(cut(dataset[,j],quantile(dataset[,j],n.probs),include.lowest=T))] n.motif[,j] = discr } res <- list(motif=n.motif, labels=n.labels, n.classes=nclasses) return(res) } but if you try to call this with a dataset with a lot of same value, you got a Error in cut.default(dataset[, j], quantile(dataset[, j], n.probs), include.lowest = T) : cut: breaks are not unique I perfectly understand why but I would like to know how to avoid this behaviour. for e.g., use this code to raise the error: x=matrix(0,1000,1) x[100]=1 Classify.Quantile(x, 10) of course this dataset is a bit extreme but it happens to get data with very small variance. Thanks for any help you could provide
Eric Rodriguez wrote:> Hi, > > I would like to break a dataset in n.classes quantiles. > Till now, I used the following code: > Classify.Quantile <- function (dataset, nclasses = 10) > { > n.probs <- seq(0,1,length=nclasses+1) > n.labels = paste("C", 1:nclasses-1, sep="") > n.rows <- nrow(dataset) > n.cols <- ncol(dataset) > n.motif <- dataset > > for (j in 2:n.cols) > { > cat(j, " "); > discr = n.labels[unclass(cut(dataset[,j],quantile(dataset[,j],n.probs),include.lowest=T))] > n.motif[,j] = discr > } > > res <- list(motif=n.motif, labels=n.labels, n.classes=nclasses) > return(res) > } > > > but if you try to call this with a dataset with a lot of same value, you got a > Error in cut.default(dataset[, j], quantile(dataset[, j], n.probs), > include.lowest = T) : > cut: breaks are not unique > > I perfectly understand why but I would like to know how to avoid this behaviour. > > for e.g., use this code to raise the error: > x=matrix(0,1000,1) > x[100]=1 > Classify.Quantile(x, 10) > > of course this dataset is a bit extreme but it happens to get data > with very small variance. > > > Thanks for any help you could provideThe cut2 function in the Hmisc package may help. -FH -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University