j.logsdon@lancaster.ac.uk
2001-Sep-27 15:37 UTC
[R] Making a factor with common levels ...
This is doing my head in. Staying away from R for too long is bad for the health! I have two vectors of character names where there may be repetition and from which I want to form two factors with the same levels but only if there are more than N instances of each name in each vector. I can get the list of common names quite easily, using: nn<-sort(unique(c(levels(n1)[table(n1)>N],levels(n0)[table(n0)>N]))) Some of the factor levels may be empty for one of the factors but the same level must be present in the other. Is there a simple way to extract nn0 and nn1 so that the pairs remain correctly aligned and each list has at least N cases of each name? Or do I have to jump into my steamroller and do a couple of loops? TIA John -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
The combine.levels function in the Hmisc library is related to this: combine.levels <- function(x, minlev=.05) { x <- as.factor(x) lev <- levels(x) f <- table(x)/sum(!is.na(x)) i <- f < minlev si <- sum(i) if(si==0) return(x) levels(x) <- if(si==1) list(names(sort(f))[1:2]) else list(OTHER=names(f)[i]) x } This combines levels that have a relative frequency below 'minlev' into new categories. -Frank Harrell j.logsdon at lancaster.ac.uk wrote:> > This is doing my head in. Staying away from R for too long is bad for the > health! > > I have two vectors of character names where there may be repetition and > from which I want to form two factors with the same levels but only if > there are more than N instances of each name in each vector. > > I can get the list of common names quite easily, using: > > nn<-sort(unique(c(levels(n1)[table(n1)>N],levels(n0)[table(n0)>N]))) > > Some of the factor levels may be empty for one of the factors but the same > level must be present in the other. > > Is there a simple way to extract nn0 and nn1 so that the pairs remain > correctly aligned and each list has at least N cases of each name? Or do > I have to jump into my steamroller and do a couple of loops? > > TIA > > John > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Possibly Parallel Threads
- help a newbie with a loop
- Help using tapply with multiple variables
- Training nnet in two ways, trying to understand the performance difference - with (i hope!) commented, minimal, self-contained, reproducible code
- llvm combines "ADD frameindex, constant" to OR
- indexing matrices with dimnames?