similar to: how to collapse categories or re-categorize variables?

Displaying 20 results from an estimated 10000 matches similar to: "how to collapse categories or re-categorize variables?"

2007 Nov 24
1
Project proposal/idea: Categorize traffic by behavior
Back in 2003/2004 when finding the topic for my masters thesis, I had a secondary project idea, perhaps its about time to do something about the idea, and hear if anyone else thinks its a good idea? The basic idea is to: "Categorize traffic by behavior" The categorization should be based upon things like packet timing characteristics and packet size, rather than standard port
2010 Nov 10
2
randomForest can not handle categorical predictors with more than 32 categories
I received this error Error in randomForest.default(m, y, ...) : Can not handle categorical predictors with more than 32 categories. using below code library(randomForest) library(MASS) memory.limit(size=12999) x <- read.csv("D:/train_store_title_view.csv", header=TRUE) x <- na.omit(x) set.seed(131) sales.rf <- randomForest(sales ~ ., data=x, mtry=3, importance=TRUE) My
2011 Mar 09
2
collapse a data column into a row
I have a file with a data in columnar format like below: probeID rc_AI104113_at rc_AI178259_f_at rc_AI179134_i_at rc_AI179134_f_at rc_AI104113_at rc_AA819429_f_at How can I rewrite it in the format below: 'rc_AI104113_at', 'rc_AI178259_f_at', 'rc_AI179134_i_at', 'rc_AI179134_f_at', 'rc_AI104113_at', 'rc_AA819429_f_at' Is there any function to do
2004 Feb 26
3
Collapsing Categorical Variables
Hi, Suppose I have a categorical variable called STREET, and I have 30 levels for it (i.e. 30 different streets). I want to find all those streets with only 15 observations or below then collapse them into a level called OTHER. Is there a quick way, other than using a for() loop, to do it? Currently what I'm doing is something like: ### Collapse STREET (those < 15) st <- c()
2005 Mar 22
2
Error: Can not handle categorical predictors with more than 32 categories.
Hi All, My question is in regards to an error generated when using randomForest in R. Is there a special way to format the data in order to avoid this error, or am I completely confused on what the error implies? "Error in randomForest.default(m, y, ...) : Can not handle categorical predictors with more than 32 categories." This is generated from the command line: >
2011 Mar 01
2
regression with categorical nuisance variable
Hi, I am new to R, so I am unsure of the formula to set up this analysis. I would like to run a linear model with a continuous dependent variable (brain volume) and a continuous independent variable (age) while controlling for a categorical nuisance variable (gender). Age and brain volume are correlated. There are no gender differences in age but there are significant gender differences in brain
2012 Jan 26
2
How do I use the cut function to assign specific cut points?
I am new to R, and I am trying to cut a continuous variable BMI into different categories and can't figure out how to use it. I would like to cut it into four groups: <20, 20-25, 25-30 and >= 30. I am having difficulty figuring the code for <20 and >=30? Please help. Thank you. -- View this message in context:
2005 Jun 06
9
R Graph Gallery : categorization of the graphs
Hello all, It seems that the next improvement to the R Graph Gallery is categorization of the graphics, that way each graph will be easier to find. That step should be done *carefully* if we want to avoid the opposite side-effect : graph not reachable through the categories. That's why the wisdom of the R community is required. Graphics will be classified in : - categories -
2009 Jul 30
4
truncating values into separate categories
Hi all, Simple question which i thought i had the answer but it isnt so simple for some reason. I am sure someone can easily help. I would like to categorize the values in NP into 1 of the five values in "Per", with the last category("4") representing values >=4(hence 4:max(NP)). The problem is that R is reading max(NP) as multiple values instead of range so the
2010 Sep 05
4
converting string vector to integer/numeric vector
Hi, Is it possible to convert a string vector to integer or numeric vector? In my situation I receive data in a string vector and have to convert it based on a given type. -- Rajesh.J [[alternative HTML version deleted]]
2013 Sep 22
2
Coding several dummy variables into a single categorical variable
Colleagues, I have generated several dummy variables: n$native0 <- 1 * (n$re=="white" & n$usborn=="yes") n$native1 <- 1 * (n$re=="afam" & n$usborn=="yes") n$native2 <- 1 * (n$re=="carib" & n$usborn=="yes") n$native3 <- 1 * (n$re=="carib" & n$usborn=="no") I would now like to combine these
2012 Nov 27
1
Using factor variables with overlapping categories
ear folks ? I have a question, though it is more of a logic- or a good practices-question than a programming question per se. I am working with data from the American Community Survey summary file. It is mainly categorical count data. Currently I am working with about 40 tables covering about 35 variables, mainly in two-way tables, with some 3-way and a handful of four-way tables. I am going to
2011 Mar 08
3
A plot similar to violin plot
Dear R Users, I would like to know is there any package to create a plot like this? http://dl.dropbox.com/u/5409929/cs1160521f01.gif X axis is categorical. And the positions of the points are corresponding to the frequency. (similar to violinplot) Thank you. Regards, CH -- CH Chan
2005 Mar 23
0
Error: Can not handle categorical predictors with more th an 32 categories.
It always helps to check whether you got the data into R correctly. Hint: What does str(credit) tell you? Andy > From: Melanie Vida > > Hi All, > > My question is in regards to an error generated when using > randomForest > in R. Is there a special way to format the data in order to > avoid this > error, or am I completely confused on what the error implies?
2004 Apr 27
1
coding of categories in rpart
Hello, I am using rpart to derive classification rules for customer segments. I have a few categorical variables in the set of independent variables. For instance, Account Size can be (Very-Small, Small, Medium, Large, V-Large) Rpart seems to encode these categories into: a,b,c,d,e The results are expressed in terms of the encoded values. How do I find out what encoding was used by rpart.
2003 Sep 08
1
problems with categorical variables
Hi All: I am working on a dataset of a study on healthcare workers. One of the variables I am studying is a categorical variable (variable name:EDUC, indicates educational achievement, with 6 levels: "illiterate", "primary", "junior high school", "high school completed", "undergraduate", and "postgraduate"). I want to collapse the
2009 Aug 11
1
Categorizing Lines
Hi all, i have a dataset of 3D coordinates and can't figure out how to' tell R which One Are the individuals: I have 3 columns which i named x,y and z And then i have 2607 Lines, but each specimen is 33 Lines (79 specimens) How can i tell R to' categorize individuals every 33 Lines? Thanks in advance
2012 May 08
1
Regression with very high number of categorical variables
Dear all, I would like to run a simple regression model y~x1+x2+x3+... The problem is that I have a lot of independent variables (xi) -- around one hundred -- and that some of them are categorical with a lot of categories (like, for example, ZIP code). One straightforward way would be to (a) transform all categorical variables into 1/0 dummies and (b) enter all the variables into an lm model.
2010 Sep 17
2
grouping dataframe entries using a categorical variable
DearR Users, I have a problem which I think you might be able to help. I have a dataframe which I'm trying to "filter" following different groups I specified. It's a little hard to explain, so here is an example: My dataframe: ESS DHP 1 EPB 22 2 SAB 10 3 SAB 20 4 BOJ 14 5 ERS 28 11 SAB 10 12 SAB 22 13 BOJ 26 20 SAB 10 21 SAB 22 22 BOJ 32 29 SAB 14 30 SAB
2017 Aug 18
1
Meta-regression of categorical variables
Dear metafor users, I am working on a meta-analysis of reliability and the correlation associations. I need some help about conducting categorical moderators variables. Questions 1: How to conduct the weighted ANOVAs assuming a mixed-effects model on the tranformed alpha coefficients/the tranformes correlation coefficients for the categorical moderator variables? Questions 2: How to