What is the quickest way to create many categorical variables (factors) from continuous variables? This is the approach that I have used: # create sample data N <- 20 x <- runif(N,0,1) # setup ranges to define categories x.a <- (x >= 0.0) & (x < 0.4) x.b <- (x >= 0.4) & (x < 0.5) x.c <- (x >= 0.5) & (x < 0.6) x.d <- (x >= 0.6) & (x < 1.0) # create factors i <- runif(N,1,1) x.new <- (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d) x.factor <- factor(x.new) I'm looking for a better / simpler / more elegant / more robust (as the number of categories increases) way to do this. I also don't like that my factor names can only be numbers in this example. I would prefer a solution to take a form like the following (inspired by the "hist" function): # define breakpoints x.breaks = c(0, 0.4, 0.5, 0.6, 1.0) x.factornames = c( "0 - 0.4", "0.4 - 0.5", "0.5 - 0.6", "0.6 - 1.0" ) x.factor = unknown.function( x, x.breaks, x.factornames ) Thanks, David P.S. Here's what I have read to try to find the answer to my problem: * "Introductory Statistics with R" * "A Brief Guide to R for Beginners in Econometrics" * "Econometrics in R"
?cut -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David James > Sent: Friday, August 26, 2005 2:00 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Creating factors from continuous variables > > What is the quickest way to create many categorical variables > (factors) from continuous variables? > > This is the approach that I have used: > > # create sample data > N <- 20 > x <- runif(N,0,1) > > # setup ranges to define categories > x.a <- (x >= 0.0) & (x < 0.4) > x.b <- (x >= 0.4) & (x < 0.5) > x.c <- (x >= 0.5) & (x < 0.6) > x.d <- (x >= 0.6) & (x < 1.0) > > # create factors > i <- runif(N,1,1) > x.new <- (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d) > x.factor <- factor(x.new) > > I'm looking for a better / simpler / more elegant / more robust (as > the number of categories increases) way to do this. I also don't > like that my factor names can only be numbers in this example. I > would prefer a solution to take a form like the following (inspired > by the "hist" function): > > # define breakpoints > x.breaks = c(0, 0.4, 0.5, 0.6, 1.0) > x.factornames = c( "0 - 0.4", "0.4 - 0.5", "0.5 - 0.6", "0.6 - 1.0" ) > x.factor = unknown.function( x, x.breaks, x.factornames ) > > Thanks, > David > > P.S. Here's what I have read to try to find the answer to my problem: > * "Introductory Statistics with R" > * "A Brief Guide to R for Beginners in Econometrics" > * "Econometrics in R" > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
?cut This is in `An Introduction to R', the manual which ships with R and basic reading. On Fri, 26 Aug 2005, David James wrote:> What is the quickest way to create many categorical variables > (factors) from continuous variables? > > This is the approach that I have used: > > # create sample data > N <- 20 > x <- runif(N,0,1) > > # setup ranges to define categories > x.a <- (x >= 0.0) & (x < 0.4) > x.b <- (x >= 0.4) & (x < 0.5) > x.c <- (x >= 0.5) & (x < 0.6) > x.d <- (x >= 0.6) & (x < 1.0) > > # create factors > i <- runif(N,1,1) > x.new <- (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d) > x.factor <- factor(x.new) > > I'm looking for a better / simpler / more elegant / more robust (as > the number of categories increases) way to do this. I also don't > like that my factor names can only be numbers in this example. I > would prefer a solution to take a form like the following (inspired > by the "hist" function): > > # define breakpoints > x.breaks = c(0, 0.4, 0.5, 0.6, 1.0) > x.factornames = c( "0 - 0.4", "0.4 - 0.5", "0.5 - 0.6", "0.6 - 1.0" ) > x.factor = unknown.function( x, x.breaks, x.factornames ) > > Thanks, > David > > P.S. Here's what I have read to try to find the answer to my problem: > * "Introductory Statistics with R" > * "A Brief Guide to R for Beginners in Econometrics" > * "Econometrics in R"-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Reasonably Related Threads
- Multinomial Nested Logit package in R?
- please recommend statistics, time series and econometrics books with finance, macroeconomics, trading and business applications
- problem installing Econometrics view
- multiple lines with the same data frame?
- requesting new entry to keywords.db