Displaying 20 results from an estimated 10000 matches similar to: "how to collapse categories or re-categorize variables?"
2007 Nov 24
1
Project proposal/idea: Categorize traffic by behavior
Back in 2003/2004 when finding the topic for my masters thesis, I had a
secondary project idea, perhaps its about time to do something about the
idea, and hear if anyone else thinks its a good idea?
The basic idea is to: "Categorize traffic by behavior"
The categorization should be based upon things like packet timing
characteristics and packet size, rather than standard port
2010 Nov 10
2
randomForest can not handle categorical predictors with more than 32 categories
I received this error
Error in randomForest.default(m, y, ...) :
Can not handle categorical predictors with more than 32 categories.
using below code
library(randomForest)
library(MASS)
memory.limit(size=12999)
x <- read.csv("D:/train_store_title_view.csv", header=TRUE)
x <- na.omit(x)
set.seed(131)
sales.rf <- randomForest(sales ~ ., data=x, mtry=3,
importance=TRUE)
My
2011 Mar 09
2
collapse a data column into a row
I have a file with a data in columnar format like below:
probeID
rc_AI104113_at
rc_AI178259_f_at
rc_AI179134_i_at
rc_AI179134_f_at
rc_AI104113_at
rc_AA819429_f_at
How can I rewrite it in the format below:
'rc_AI104113_at', 'rc_AI178259_f_at', 'rc_AI179134_i_at',
'rc_AI179134_f_at', 'rc_AI104113_at', 'rc_AA819429_f_at'
Is there any function to do
2004 Feb 26
3
Collapsing Categorical Variables
Hi,
Suppose I have a categorical variable called STREET, and I have 30
levels for it (i.e. 30 different streets). I want to find all those
streets with only 15 observations or below then collapse them into a
level called OTHER. Is there a quick way, other than using a for()
loop, to do it? Currently what I'm doing is something like:
### Collapse STREET (those < 15)
st <- c()
2005 Mar 22
2
Error: Can not handle categorical predictors with more than 32 categories.
Hi All,
My question is in regards to an error generated when using randomForest
in R. Is there a special way to format the data in order to avoid this
error, or am I completely confused on what the error implies?
"Error in randomForest.default(m, y, ...) :
Can not handle categorical predictors with more than 32 categories."
This is generated from the command line:
>
2011 Mar 01
2
regression with categorical nuisance variable
Hi,
I am new to R, so I am unsure of the formula to set up this analysis.
I would like to run a linear model with a continuous dependent
variable (brain volume) and a continuous independent variable (age)
while controlling for a categorical nuisance variable (gender).
Age and brain volume are correlated.
There are no gender differences in age but there are significant
gender differences in brain
2012 Jan 26
2
How do I use the cut function to assign specific cut points?
I am new to R, and I am trying to cut a continuous variable BMI into
different categories and can't figure out how to use it. I would like to cut
it into four groups: <20, 20-25, 25-30 and >= 30. I am having difficulty
figuring the code for <20 and >=30? Please help. Thank you.
--
View this message in context:
2005 Jun 06
9
R Graph Gallery : categorization of the graphs
Hello all,
It seems that the next improvement to the R Graph Gallery is
categorization of the graphics, that way each graph will be easier to
find. That step should be done *carefully* if we want to avoid the
opposite side-effect : graph not reachable through the categories.
That's why the wisdom of the R community is required.
Graphics will be classified in :
- categories
-
2009 Jul 30
4
truncating values into separate categories
Hi all,
Simple question which i thought i had the answer but it isnt so simple for
some reason. I am sure someone can easily help. I would like to categorize
the values in NP into 1 of the five values in "Per", with the last
category("4") representing values >=4(hence 4:max(NP)). The problem is that
R is reading max(NP) as multiple values instead of range so the
2010 Sep 05
4
converting string vector to integer/numeric vector
Hi,
Is it possible to convert a string vector to integer or numeric vector? In
my situation I receive data in a string vector and have to convert it based
on a given type.
--
Rajesh.J
[[alternative HTML version deleted]]
2013 Sep 22
2
Coding several dummy variables into a single categorical variable
Colleagues,
I have generated several dummy variables:
n$native0 <- 1 * (n$re=="white" & n$usborn=="yes")
n$native1 <- 1 * (n$re=="afam" & n$usborn=="yes")
n$native2 <- 1 * (n$re=="carib" & n$usborn=="yes")
n$native3 <- 1 * (n$re=="carib" & n$usborn=="no")
I would now like to combine these
2012 Nov 27
1
Using factor variables with overlapping categories
ear folks ?
I have a question, though it is more of a logic- or a good
practices-question than a programming question per se. I am working with
data from the American Community Survey summary file. It is mainly
categorical count data. Currently I am working with about 40 tables covering
about 35 variables, mainly in two-way tables, with some 3-way and a handful
of four-way tables. I am going to
2011 Mar 08
3
A plot similar to violin plot
Dear R Users,
I would like to know is there any package to create a plot like this?
http://dl.dropbox.com/u/5409929/cs1160521f01.gif
X axis is categorical. And the positions of the points are
corresponding to the frequency. (similar to violinplot)
Thank you.
Regards,
CH
--
CH Chan
2005 Mar 23
0
Error: Can not handle categorical predictors with more th an 32 categories.
It always helps to check whether you got the data into R correctly. Hint:
What does str(credit) tell you?
Andy
> From: Melanie Vida
>
> Hi All,
>
> My question is in regards to an error generated when using
> randomForest
> in R. Is there a special way to format the data in order to
> avoid this
> error, or am I completely confused on what the error implies?
2004 Apr 27
1
coding of categories in rpart
Hello,
I am using rpart to derive classification rules for customer segments.
I have a few categorical variables in the set of independent variables.
For instance,
Account Size can be (Very-Small, Small, Medium, Large, V-Large)
Rpart seems to encode these categories into: a,b,c,d,e
The results are expressed in terms of the encoded values.
How do I find out what encoding was used by rpart.
2003 Sep 08
1
problems with categorical variables
Hi All:
I am working on a dataset of a study on healthcare workers. One of the
variables I am studying is a categorical variable (variable name:EDUC,
indicates educational achievement, with 6 levels: "illiterate", "primary",
"junior high school", "high school completed", "undergraduate", and
"postgraduate").
I want to collapse the
2009 Aug 11
1
Categorizing Lines
Hi all, i have a dataset of 3D coordinates and can't figure out how
to' tell R which One Are the individuals:
I have 3 columns which i named x,y and z
And then i have 2607 Lines, but each specimen is 33 Lines (79 specimens)
How can i tell R to' categorize individuals every 33 Lines?
Thanks in advance
2012 May 08
1
Regression with very high number of categorical variables
Dear all,
I would like to run a simple regression model y~x1+x2+x3+...
The problem is that I have a lot of independent variables (xi) -- around
one hundred -- and that some of them are categorical with a lot of
categories (like, for example, ZIP code). One straightforward way would be
to (a) transform all categorical variables into 1/0 dummies and (b) enter
all the variables into an lm model.
2010 Sep 17
2
grouping dataframe entries using a categorical variable
DearR Users,
I have a problem which I think you might be able to help. I have a dataframe which I'm trying to "filter" following different groups I specified. It's a little hard to explain, so here is an example:
My dataframe:
ESS DHP
1 EPB 22
2 SAB 10
3 SAB 20
4 BOJ 14
5 ERS 28
11 SAB 10
12 SAB 22
13 BOJ 26
20 SAB 10
21 SAB 22
22 BOJ 32
29 SAB 14
30 SAB
2017 Aug 18
1
Meta-regression of categorical variables
Dear metafor users,
I am working on a meta-analysis of reliability and the correlation associations.
I need some help about conducting categorical moderators variables.
Questions 1: How to conduct the weighted ANOVAs assuming a mixed-effects model on the tranformed alpha coefficients/the tranformes correlation coefficients for the categorical moderator variables?
Questions 2: How to