thr3ads.net - R help - [R] Convert character string to top levels + NAN [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Michael Haenlein

2010-Apr-22 09:16 UTC

[R] Convert character string to top levels + NAN

Dear all,

I have several character strings with a high number of different levels.
unique(x) gives me values in the range of 100-200.
This creates problems as I would like to use them as predictors in a coxph
model.

I therefore would like to convert each of these strings to a new string
(x_new).
x_new should be equal to x for the top n categories (i.e. the top n levels
with the highest occurrence) and NAN elsewhere.
For example, for n=3 x_new would have three levels: The three most common
levels of x + NAN.

Is there some convenient way of doing this?

Thanks in advance,

Michael


Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

	[[alternative HTML version deleted]]

David Winsemius

2010-Apr-22 13:21 UTC

head link

[R] Convert character string to top levels + NAN

On Apr 22, 2010, at 5:16 AM, Michael Haenlein wrote:
> Dear all,
>
> I have several character strings with a high number of different  
> levels.
> unique(x) gives me values in the range of 100-200.
> This creates problems as I would like to use them as predictors in a  
> coxph
> model.
>
> I therefore would like to convert each of these strings to a new  
> string
> (x_new).
> x_new should be equal to x for the top n categories (i.e. the top n  
> levels
> with the highest occurrence) and NAN elsewhere.
> For example, for n=3 x_new would have three levels: The three most  
> common
> levels of x + NAN.
>
> Is there some convenient way of doing this?
  x <- sample(c("top", "three", "levels",
"0ther", "strings"), 30,
                  replace=TRUE, prob=c(.3,.3,.3,.1,.1))
  y <- c("top", "three", "levels")
  xnew <- x
  xnew[ !xnew %in% y ] <- "NAN"  # not same as NaN
  table(xnew)

#--------
xnew
levels    NAN  three    top
      5      5      9     11

-- 
David.
>
> Thanks in advance,
>
> Michael
>
>
> Michael Haenlein
> Associate Professor of Marketing
> ESCP Europe
> Paris, France
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Apparently Analagous Threads

Search for more reasonably related threads

R help - Apr 2010 - Convert character string to top levels + NAN

[R] Convert character string to top levels + NAN

[R] Convert character string to top levels + NAN

Apparently Analagous Threads