Well, I think that's kind of overkill. Assuming "oldvar" is a factor in the data frame mydata, then the following shows how to do it:> set.seed(27) > d <- data.frame(a = sample(c(letters[1:3],NA),15,replace = TRUE)) > da 1 <NA> 2 a 3 <NA> 4 b 5 a 6 b 7 a 8 a 9 a 10 a 11 c 12 <NA> 13 c 14 c 15 <NA>> d$b <- factor(d$a,labels = LETTERS[3:1]) > da b 1 <NA> <NA> 2 a C 3 <NA> <NA> 4 b B 5 a C 6 b B 7 a C 8 a C 9 a C 10 a C 11 c A 12 <NA> <NA> 13 c A 14 c A 15 <NA> <NA> See ?factor for details. Incidentally note that in the OP's post, mydata$newvar[oldvar = "topic1"] <- "parenttopic" is completely incorrect; it should probably be: mydata$newvar[mydata$oldvar == "topic1"] <- "parenttopic"; This suggests to me that the OP would probably find it useful to spend some time with one or more of the many good R tutorials on the web. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Oct 10, 2016 at 9:08 AM, S Ellison <S.Ellison at lgcgroup.com> wrote:>> Is there a convenient way to edit this code to allow me to recode a list of >> categories 'topic 1', 'topic 9' and 'topic 14', say, of the the old variable 'oldvar' >> as 'parenttopic' by means of the new variable 'newvar', while also mapping >> system missing values to system missing values? > > You could look at 'recode()' in the car package. > > There's a fair description of other options at http://www.uni-kiel.de/psychologie/rexrepos/posts/recode.html > > S Ellison > > > > > ******************************************************************* > This email and any attachments are confidential. Any u...{{dropped:8}}
> Well, I think that's kind of overkill.Depends whether you want to recode all or some, and how robust you want the answer to be. recode() allows you to recode a few levels of many, without dependence on level ordering; that's kind of neat. tbh, though, I don't use recode() a lot; I generally find myself need to change a fair proportion of level labels. But I do get nervous about relying on specific ordering; it can break without visible warning if the data change (eg if you lose a factor level with a slightly different data set, integer indexing will give you apparently valid reassignment to the wrong new codes). So I tend to go via named vectors even if it costs me a lot of typing. For example to change lcase<-c('a', 'b', 'c') to c('B', 'A', 'C') I'll use something like c(a='B', b='A', c='C')[lcase] or, if lcase were a factor, c(a='B', b='A', c='C')[as.character(lcase)] Unlike using the numeric levels, that doesn't fail if some of the levels I expect are absent; it only fails (and does so visibly) when there's a value in there that I haven't assigned a coding to. So it's a tad more robust. Steve E ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
Still overkill, I believe. " Unlike using the numeric levels, that doesn't fail if some of the levels I expect are absent; it only fails (and does so visibly) when there's a value in there that I haven't assigned a coding to. So it's a tad more robust. " If you are concerned about missing levels -- which I agree is legitimate -- then the following simple modification works (for **factors** of course):> d <- factor(letters[1:2],levels= letters[1:3]) > d[1] a b Levels: a b c> f <- factor(d,levels = levels(d), labels = LETTERS[3:1]) > f[1] C B Levels: C B A ## No levels lost ! Does that allay your concerns? Cheers, Bert
On 11 Oct 2016, at 01:32 , S Ellison <S.Ellison at LGCGroup.com> wrote:>> Well, I think that's kind of overkill. > Depends whether you want to recode all or some, and how robust you want the answer to be. > recode() allows you to recode a few levels of many, without dependence on level ordering; that's kind of neat. > > tbh, though, I don't use recode() a lot; I generally find myself need to change a fair proportion of level labels. > > But I do get nervous about relying on specific ordering; it can break without visible warning if the data change (eg if you lose a factor level with a slightly different data set, integer indexing will give you apparently valid reassignment to the wrong new codes). So I tend to go via named vectors even if it costs me a lot of typing. For example to change > lcase<-c('a', 'b', 'c') > > to c('B', 'A', 'C') I'll use something like > > c(a='B', b='A', c='C')[lcase] > > or, if lcase were a factor, > c(a='B', b='A', c='C')[as.character(lcase)]Notice that similar functionality is available via levels<-() (see help page for more features)> f <- factor(c("a","b","c")) > levels(f) <- list(A="a", B="b", C="c") > f[1] A B C Levels: A B C The main advantage of this is that you control the level ordering, and also that you don't quite as easily get caught out by unused levels:> f <- factor(c("a","c")) > levels(f) <- list(A="a", B="b", C="c") > table(f)f A B C 1 0 1 (in which the 0 count might be important). -pd> > Unlike using the numeric levels, that doesn't fail if some of the levels I expect are absent; it only fails (and does so visibly) when there's a value in there that I haven't assigned a coding to. So it's a tad more robust. > > Steve E > > > > > > > ******************************************************************* > This email and any attachments are confidential. Any use...{{dropped:8}} > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com