D. Alain
2011-Feb-04 12:32 UTC
[R] recode according to specific sequence of characters within a string variable
Dear R-List, I have a dataframe with one column "name.of.report" containing character values, e.g.>df$name.of.report"jeff_2001_teamx" "teamy_jeff_2002" "robert_2002_teamz" "mary_2002_teamz" "2003_mary_teamy" ... (i.e. the bit of interest is not always at same position) Now I want to recode the column "name.of.report" into the variables "person", "year","team", like this>new.df"person" "year" "team" jeff 2001 x jeff 2002 y robert 2002 z mary 2002 z I tried with grep() df$person<-grep("jeff",df$name.of.report) but of course it didn't exactly result in what I wanted to do. Could not find any solution via RSeek. Excuse me if it is a very silly question, but can anyone help me find a way out of this? Thanks a lot Alain [[alternative HTML version deleted]]
Marc Schwartz
2011-Feb-04 13:09 UTC
[R] recode according to specific sequence of characters within a string variable
On Feb 4, 2011, at 6:32 AM, D. Alain wrote:> Dear R-List, > > I have a dataframe with one column "name.of.report" containing character values, e.g. > > >> df$name.of.report > > "jeff_2001_teamx" > "teamy_jeff_2002" > "robert_2002_teamz" > "mary_2002_teamz" > "2003_mary_teamy" > ... > (i.e. the bit of interest is not always at same position) > > Now I want to recode the column "name.of.report" into the variables "person", "year","team", like this > >> new.df > > "person" "year" "team" > jeff 2001 x > jeff 2002 y > robert 2002 z > mary 2002 z > > I tried with grep() > > df$person<-grep("jeff",df$name.of.report) > > but of course it didn't exactly result in what I wanted to do. Could not find any solution via RSeek. Excuse me if it is a very silly question, but can anyone help me find a way out of this? > > Thanks a lot > > AlainThere will be several approaches, all largely involving the use of ?regex. Here is one: DF <- data.frame(name.of.report = c("jeff_2001_teamx", "teamy_jeff_2002", "robert_2002_teamz", "mary_2002_teamz", "2003_mary_teamy"))> DFname.of.report 1 jeff_2001_teamx 2 teamy_jeff_2002 3 robert_2002_teamz 4 mary_2002_teamz 5 2003_mary_teamy DF.new <- data.frame(person = gsub("[_0-9]|team.", "", DF$name.of.report), year = gsub(".*([0-9]{4}).*","\\1", DF$name.of.report), team = gsub(".*team(.).*","\\1", DF$name.of.report))> DF.newperson year team 1 jeff 2001 x 2 jeff 2002 y 3 robert 2002 z 4 mary 2002 z 5 mary 2003 y HTH, Marc Schwartz
Greg Snow
2011-Feb-04 17:52 UTC
[R] recode according to specific sequence of characters within a string variable
You can do this with regular expressions, since you want to extract specific values from the string I would suggest learning about the gsubfn package, it is a bit easier with gsubfn than with the other matching tools. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of D. Alain > Sent: Friday, February 04, 2011 5:33 AM > To: r-help at r-project.org > Subject: [R] recode according to specific sequence of characters within > a string variable > > Dear R-List, > > I have a dataframe with one column "name.of.report" containing > character values, e.g. > > > >df$name.of.report > > "jeff_2001_teamx" > "teamy_jeff_2002" > "robert_2002_teamz" > "mary_2002_teamz" > "2003_mary_teamy" > ... > (i.e. the bit of interest is not always at same position) > > Now I want to recode the column "name.of.report" into the variables > "person", "year","team", like this > > >new.df > > "person"? "year"? "team" > jeff?????????? 2001????? x > jeff?????????? 2002????? y > robert?????? 2002????? z > mary??????? 2002????? z > > I tried with grep() > > df$person<-grep("jeff",df$name.of.report) > > but of course it didn't exactly result in what I wanted to do. Could > not find any solution via RSeek. Excuse me if it is a very silly > question, but can anyone help me find a way out of this? > > Thanks a lot > > Alain > > > > > [[alternative HTML version deleted]]