thr3ads.net - R help - [R] Recoding lists of categories of a variable [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2016-Oct-10 17:39 UTC

[R] Recoding lists of categories of a variable

Well, I think that's kind of overkill.

Assuming "oldvar" is a factor in the data frame mydata, then the
following shows how to do it:
> set.seed(27)
> d <- data.frame(a = sample(c(letters[1:3],NA),15,replace = TRUE))
> d      a
1  <NA>
2     a
3  <NA>
4     b
5     a
6     b
7     a
8     a
9     a
10    a
11    c
12 <NA>
13    c
14    c
15 <NA>

> d$b <- factor(d$a,labels = LETTERS[3:1])
> d      a    b
1  <NA> <NA>
2     a    C
3  <NA> <NA>
4     b    B
5     a    C
6     b    B
7     a    C
8     a    C
9     a    C
10    a    C
11    c    A
12 <NA> <NA>
13    c    A
14    c    A
15 <NA> <NA>


See ?factor for details.

Incidentally note that in the OP's post,

mydata$newvar[oldvar = "topic1"] <- "parenttopic"

is completely incorrect; it should probably be:

mydata$newvar[mydata$oldvar == "topic1"] <-
"parenttopic";

This suggests to me that the OP would probably find it useful to spend
some time with one or more of the many good R tutorials on the web.

Cheers,
Bert











Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Oct 10, 2016 at 9:08 AM, S Ellison <S.Ellison at lgcgroup.com>
wrote:>> Is there a convenient way to edit this code to allow me to recode a
list of
>> categories 'topic 1', 'topic 9' and 'topic 14',
say, of the the old variable 'oldvar'
>> as 'parenttopic' by means of the new variable 'newvar',
while also mapping
>> system missing values to system missing values?
>
> You could look at 'recode()' in the car package.
>
> There's a fair description of other options at
http://www.uni-kiel.de/psychologie/rexrepos/posts/recode.html
>
> S Ellison
>
>
>
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:8}}

S Ellison

2016-Oct-10 23:32 UTC

head link

[R] Recoding lists of categories of a variable

> Well, I think that's kind of overkill.Depends whether you want to recode all or some, and how robust you want the
answer to be.
recode() allows you to recode a few levels of many, without dependence on level
ordering; that's kind of neat.

tbh, though,  I don't use recode() a lot; I generally find myself need to
change a fair proportion of level labels.

But I do get nervous about relying on specific ordering; it can break without
visible warning if the data change (eg if you lose a factor level with a
slightly different data set, integer indexing will give you apparently valid
reassignment to the wrong new codes).  So I tend to go via named vectors even if
it costs me a lot of typing. For example to change
lcase<-c('a', 'b', 'c') 

to c('B', 'A', 'C') I'll use something like 

c(a='B', b='A', c='C')[lcase] 

or, if lcase were a factor, 
c(a='B', b='A', c='C')[as.character(lcase)] 

Unlike using the numeric levels, that doesn't fail if some of the levels I
expect are absent; it only fails (and does so visibly) when there's a value
in there that I haven't assigned a coding to. So it's a tad more robust.

Steve E






*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

Bert Gunter

2016-Oct-11 03:49 UTC

head link

[R] Recoding lists of categories of a variable

Still overkill, I believe.


" Unlike using the numeric levels, that doesn't fail if some of the
levels I expect are absent; it only fails (and does so visibly) when
there's a value in there that I haven't assigned a coding to. So
it's
a tad more robust. "


If you are concerned about missing levels -- which I agree is
legitimate -- then the following simple modification works (for
**factors** of course):
> d <- factor(letters[1:2],levels= letters[1:3])
> d[1] a b
Levels: a b c> f <- factor(d,levels = levels(d), labels = LETTERS[3:1])
> f[1] C B
Levels: C B A

## No levels lost !

Does that allay your concerns?

Cheers,
Bert

peter dalgaard

2016-Oct-11 09:59 UTC

head link

[R] Recoding lists of categories of a variable

On 11 Oct 2016, at 01:32 , S Ellison <S.Ellison at LGCGroup.com> wrote:
>> Well, I think that's kind of overkill.
> Depends whether you want to recode all or some, and how robust you want the
answer to be.
> recode() allows you to recode a few levels of many, without dependence on
level ordering; that's kind of neat.
> 
> tbh, though,  I don't use recode() a lot; I generally find myself need
to change a fair proportion of level labels.
> 
> But I do get nervous about relying on specific ordering; it can break
without visible warning if the data change (eg if you lose a factor level with a
slightly different data set, integer indexing will give you apparently valid
reassignment to the wrong new codes).  So I tend to go via named vectors even if
it costs me a lot of typing. For example to change
> lcase<-c('a', 'b', 'c') 
> 
> to c('B', 'A', 'C') I'll use something like 
> 
> c(a='B', b='A', c='C')[lcase] 
> 
> or, if lcase were a factor, 
> c(a='B', b='A', c='C')[as.character(lcase)] 
Notice that similar functionality is available via levels<-() (see help page
for more features)
> f <- factor(c("a","b","c"))
> levels(f) <- list(A="a", B="b", C="c")
> f[1] A B C
Levels: A B C

The main advantage of this is that you control the level ordering, and also that
you don't quite as easily get caught out by unused levels:
> f <- factor(c("a","c"))
> levels(f) <- list(A="a", B="b", C="c")
> table(f)f
A B C 
1 0 1 

(in which the 0 count might be important).

-pd
> 
> Unlike using the numeric levels, that doesn't fail if some of the
levels I expect are absent; it only fails (and does so visibly) when there's
a value in there that I haven't assigned a coding to. So it's a tad more
robust.
> 
> Steve E
> 
> 
> 
> 
> 
> 
> *******************************************************************
> This email and any attachments are confidential. Any use...{{dropped:8}}
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

R help - Oct 2016 - Recoding lists of categories of a variable

[R] Recoding lists of categories of a variable

[R] Recoding lists of categories of a variable

[R] Recoding lists of categories of a variable

[R] Recoding lists of categories of a variable