Bert Gunter
2011-Oct-24 15:59 UTC
[R] How to create a new variable based on parts of another character variable: A generalization
... Well, this works in this simple case, but is too clumsy for a general formulation of this problem: given a "dictionary" consisting of two character vectors of unique "names" (or two columns in a data frame), x and y, how does one convert a factor z with levels in x into the corresponding equivalent with levels in y? There are likely a zillion ways to do this with various packages and functions, but the simplest and most straightforward must surely be: factor(y[z]) Example:> x <- LETTERS[1:4] > y <- LETTERS[5:8] > z <- factor(sample(x,15, rep=TRUE)) > z[1] B D A C B A B D A D D A A D B Levels: A B C D> factor(y[z])[1] F H E G F E F H E H H E E H F Levels: E F G H This is a nice example of the utility of the factor data structure, which tends to get dissed a lot, because it can badly burn you if you're not careful with it. A fuller discussion of these issues can be found by searching on"associative arrays" or "hashes", of which factors are an elementary example. -- Bert On Mon, Oct 24, 2011 at 6:00 AM, Petr PIKAL <petr.pikal@precheza.cz> wrote:> Hi > > If you want to get rid of regular expressions at all and your A values > start AWI for Arctic and UFT for boreal you can > > DF$D <- ifelse(substr(DF$A, 1,1) == "A", "Arctic", "Boreal") > > Regards > Petr > > > > > > > Hello, > > I am just starting with R and I am having a (most probably) stupid > problem > > by creating a new variable in a data.frame based on a part of another > > character variable. > > > > I have a data frame like this one: > > > > > > A B C > > AWI-test1 1 i > > AWI-test5 2 r > > AWI-tes75 56 z > > UFT-2 5 I > > UFT56 f t > > UFT356 9j t > > etc. etc. 89 t > > > > > > I now want to look in the variable A if the string AWI is present and > then > > create a variable D and putting "Arctic" inside. However, if the string > > UFT occurs in the variable A, then the variable D shall be "Boreal" etc. > etc. > > > > The resulting data.frame file should look like > > A B C D > > AWI-test1 1 i Arctic > > AWI-test5 2 r Arctic > > AWI-tes75 56 z Arctic > > UFT-2 5 I Boreal > > UFT56 f t Boreal > > UFT356 9j t Boreal > > etc. etc. 89 t > > > > > > I know how to do this when I want to look for the entire string of A > means > > when there is "AWI-test1" and then create the variable D with "Arctic" > but > > not how to look only for a substring in A? > > Would be great if somebody might help. > > Thanks > > Philipp > > > > > > > > *************************************************** > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]]
Petr PIKAL
2011-Oct-25 04:41 UTC
[R] How to create a new variable based on parts of another character variable: A generalization
Hi Bert I am aware of factor features and frankly speaking I consider them quite usefull despite of prevalent preference to character vectors. For the OP question seems to me that ifelse construction is appropriate, based on his statement he has 2 strings which shall be converted to another two strings and that he is starting with R. I agree that for more levels to change, factor is the way to go. Regards Petr> > ... Well, this works in this simple case, but is too clumsy for ageneral> formulation of this problem: given a "dictionary" consisting of two > character vectors of unique "names" (or two columns in a data frame), x > and y, how does one convert a factor z with levels in x into the > corresponding equivalent with levels in y? > > There are likely a zillion ways to do this with various packages and > functions, but the simplest and most straightforward must surely be:factor(y[z])> > Example: > > x <- LETTERS[1:4] > > y <- LETTERS[5:8] > > z <- factor(sample(x,15, rep=TRUE)) > > z > [1] B D A C B A B D A D D A A D B > Levels: A B C D > > factor(y[z]) > [1] F H E G F E F H E H H E E H F > Levels: E F G H > > This is a nice example of the utility of the factor data structure,which> tends to get dissed a lot, because it can badly burn you if you're not > careful with it. > > A fuller discussion of these issues can be found by searching > on"associative arrays" or "hashes", of which factors are an elementaryexample.> > -- Bert >> On Mon, Oct 24, 2011 at 6:00 AM, Petr PIKAL <petr.pikal at precheza.cz>wrote:> Hi > > If you want to get rid of regular expressions at all and your A values > start AWI for Arctic and UFT for boreal you can > > DF$D <- ifelse(substr(DF$A, 1,1) == "A", "Arctic", "Boreal") > > Regards > Petr > > > > > > > Hello, > > I am just starting with R and I am having a (most probably) stupid > problem > > by creating a new variable in a data.frame based on a part of another > > character variable. > > > > I have a data frame like this one: > > > > > > A B C > > AWI-test1 1 i > > AWI-test5 2 r > > AWI-tes75 56 z > > UFT-2 5 I > > UFT56 f t > > UFT356 9j t > > etc. etc. 89 t > > > > > > I now want to look in the variable A if the string AWI is present and > then > > create a variable D and putting "Arctic" inside. However, if thestring> > UFT occurs in the variable A, then the variable D shall be "Boreal"etc.> etc. > > > > The resulting data.frame file should look like > > A B C D > > AWI-test1 1 i Arctic > > AWI-test5 2 r Arctic > > AWI-tes75 56 z Arctic > > UFT-2 5 I Boreal > > UFT56 f t Boreal > > UFT356 9j t Boreal > > etc. etc. 89 t > > > > > > I know how to do this when I want to look for the entire string of A > means > > when there is "AWI-test1" and then create the variable D with "Arctic" > but > > not how to look only for a substring in A? > > Would be great if somebody might help. > > Thanks > > Philipp > > > > > > > > *************************************************** > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- > biostatistics/pdb-ncb-home.htm >