irene
2012-Mar-18 14:44 UTC
[R] Extracting numbers from a character variable of different types
Hello, I have a file which contains a column with age, which is represented in the two following patterns 1. "007/A" or ''007/a" or ''7 /a" ..... In this case A or a means year and I would like to extract only the numeric values eg 7 in the above case if this pattern exits in a line of file. 2. "004/M" or "004/m" where M or m means month ...... for these lines I would like to first extract the numeric value of Month eg. 4 and then convert it into a value of years, which would be 0.33 eg 4 divided by 12. Can anyone help? Thank you -- View this message in context: http://r.789695.n4.nabble.com/Extracting-numbers-from-a-character-variable-of-different-types-tp4482248p4482248.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2012-Mar-18 15:37 UTC
[R] Extracting numbers from a character variable of different types
On Mar 18, 2012, at 10:44 AM, irene wrote:> Hello, > > I have a file which contains a column with age, which is represented > in the > two following patterns > > 1. "007/A" or ''007/a" or ''7 /a" ..... In this case A or a means > year and I > would like to extract only the numeric values eg 7 in the above case > if this > pattern exits in a line of file. > > 2. "004/M" or "004/m" where M or m means month ...... for these > lines I > would like to first extract the numeric value of Month eg. 4 and then > convert it into a value of years, which would be 0.33 eg 4 divided > by 12.I thought it easier to get to months as an initial step: > dfrm <- read.table(text="'007/A'\n'007/a' \n '7 /a '\n '004/ M'\n'004/m'") > dfrm$agenew <- sub("(^\\d+\\s*)(/)([aA])","\\1 * 12", dfrm$V1) > dfrm$agenew2 <- sub("(^\\d+\\s*)(/)([mM])","\\1", dfrm$agenew) > dfrm$agenew2 [1] "007 * 12" "007 * 12" "7 * 12 " "004" "004" > eval(parse(text=dfrm$agenew2)) [1] 4 > sapply(dfrm$agenew2, function(x) eval(parse(text=x)) ) 007 * 12 007 * 12 7 * 12 004 004 84 84 84 4 4> > Can anyone help? > > Thank you > > -- > View this message in context: http://r.789695.n4.nabble.com/Extracting-numbers-from-a-character-variable-of-different-types-tp4482248p4482248.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Daniel Malter
2012-Mar-18 19:17 UTC
[R] Extracting numbers from a character variable of different types
Assume your year value is x<-007/A You want to replace all non-numeric characters (i.e. letters and punctuation) and all zeros with nothing. gsub('[[:alpha:]]|[[:punct:]]|0','',x) Let's say you have a vector with both month and year values (you can separate them). Now we need to identify the cells that have a month or year indicator x<-c("007/A","007/a","003/M","003/m") grep("/A|/a",x) #cells in x with year information grep("/M|/m",x) #cells in x with month information To remove all characters, punctuation, and 0s from x, do: gsub('[[:alpha:]]|[[:punct:]]|0','',x) which you can also do specifically for the cells that identify months and years, respectively: years<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/A|/a",x)]) #years years months<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/M|/m",x)]) #months months Convert the resulting character vectors into numeric vectors by as.numeric(as.character(years)) , for example. HTH, Daniel -- View this message in context: http://r.789695.n4.nabble.com/Extracting-numbers-from-a-character-variable-of-different-types-tp4482248p4482732.html Sent from the R help mailing list archive at Nabble.com.
irene
2012-Mar-26 14:04 UTC
[R] Extracting numbers from a character variable of different types
It worked perfectly! Thank you -- View this message in context: http://r.789695.n4.nabble.com/Extracting-numbers-from-a-character-variable-of-different-types-tp4482248p4505914.html Sent from the R help mailing list archive at Nabble.com.