Philipp Fischer
2011-Oct-23 13:35 UTC
[R] How to create a new variable based on parts of another character variable.
Ein eingebundener Text mit undefiniertem Zeichensatz wurde abgetrennt. Name: nicht verf?gbar URL: <stat.ethz.ch/pipermail/r-help/attachments/20111023/ae046d6a/attachment.pl>
jim holtman
2011-Oct-23 14:14 UTC
[R] How to create a new variable based on parts of another character variable.
Use regular expressions ?grepl On Sunday, October 23, 2011, Philipp Fischer <Philipp.Fischer@awi.de> wrote:> Hello, > I am just starting with R and I am having a (most probably) stupid problemby creating a new variable in a data.frame based on a part of another character variable.> > I have a data frame like this one: > > > A B C > AWI-test1 1 i > AWI-test5 2 r > AWI-tes75 56 z > UFT-2 5 I > UFT56 f t > UFT356 9j t > etc. etc. 89 t > > > I now want to look in the variable A if the string AWI is present and thencreate a variable D and putting "Arctic" inside. However, if the string UFT occurs in the variable A, then the variable D shall be "Boreal" etc. etc.> > The resulting data.frame file should look like > A B C D > AWI-test1 1 i Arctic > AWI-test5 2 r Arctic > AWI-tes75 56 z Arctic > UFT-2 5 I Boreal > UFT56 f t Boreal > UFT356 9j t Boreal > etc. etc. 89 t > > > I know how to do this when I want to look for the entire string of A meanswhen there is "AWI-test1" and then create the variable D with "Arctic" but not how to look only for a substring in A?> Would be great if somebody might help. > Thanks > Philipp > > > > *************************************************** > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guideR-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? [[alternative HTML version deleted]]
Jim Lemon
2011-Oct-24 10:22 UTC
[R] How to create a new variable based on parts of another character variable.
On 10/24/2011 12:35 AM, Philipp Fischer wrote:> Hello, > I am just starting with R and I am having a (most probably) stupid problem by creating a new variable in a data.frame based on a part of another character variable. > > I have a data frame like this one: > > > A B C > AWI-test1 1 i > AWI-test5 2 r > AWI-tes75 56 z > UFT-2 5 I > UFT56 f t > UFT356 9j t > etc. etc. 89 t > > > I now want to look in the variable A if the string AWI is present and then create a variable D and putting "Arctic" inside. However, if the string UFT occurs in the variable A, then the variable D shall be "Boreal" etc. etc. > > The resulting data.frame file should look like > A B C D > AWI-test1 1 i Arctic > AWI-test5 2 r Arctic > AWI-tes75 56 z Arctic > UFT-2 5 I Boreal > UFT56 f t Boreal > UFT356 9j t Boreal > etc. etc. 89 t > >Hi Philipp, Since you mentioned that you were just starting with R, it might be a little optimistic to throw you into the regular expression cage and expect you to emerge unscathed. You can do this by constructing a 2 column matrix or data frame of replacement values: replacements<-matrix(c("AWI","UFT","Arctic","Boreal"),ncol=2) replacements [,1] [,2] [1,] "AWI" "Arctic" [2,] "UFT" "Boreal" Then write a function using grep to replace the values: swapLabels<-function(x,y) { for(swaprow in 1:dim(y)[1]) if(length(grep(y[swaprow,1],x))) return(y[swaprow,2]) return(NA) } Finally, apply the function to the first row of the data frame: pf.df$D<-unlist(lapply(pf.df[,1],swapLabels,replacements)) pf.df$D [1] "Arctic" "Arctic" "Arctic" "Boreal" "Boreal" "Boreal" Jim
Petr PIKAL
2011-Oct-24 13:00 UTC
[R] How to create a new variable based on parts of another character variable.
Hi If you want to get rid of regular expressions at all and your A values start AWI for Arctic and UFT for boreal you can DF$D <- ifelse(substr(DF$A, 1,1) == "A", "Arctic", "Boreal") Regards Petr> > Hello, > I am just starting with R and I am having a (most probably) stupidproblem> by creating a new variable in a data.frame based on a part of another > character variable. > > I have a data frame like this one: > > > A B C > AWI-test1 1 i > AWI-test5 2 r > AWI-tes75 56 z > UFT-2 5 I > UFT56 f t > UFT356 9j t > etc. etc. 89 t > > > I now want to look in the variable A if the string AWI is present andthen> create a variable D and putting "Arctic" inside. However, if the string > UFT occurs in the variable A, then the variable D shall be "Boreal" etc.etc.> > The resulting data.frame file should look like > A B C D > AWI-test1 1 i Arctic > AWI-test5 2 r Arctic > AWI-tes75 56 z Arctic > UFT-2 5 I Boreal > UFT56 f t Boreal > UFT356 9j t Boreal > etc. etc. 89 t > > > I know how to do this when I want to look for the entire string of Ameans> when there is "AWI-test1" and then create the variable D with "Arctic"but> not how to look only for a substring in A? > Would be great if somebody might help. > Thanks > Philipp > > > > *************************************************** > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guideR-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Possibly Parallel Threads
- How to create a new variable based on parts of another character variable: A generalization
- Brillouin index
- frequency tables and sorting by rowSum
- random sampling with some limitive conditions?
- Help on competing risk package cmprsk with time dependent covariate