Stephen HK Wong
2014-Aug-01 01:51 UTC
[R] how to extract word before /// in a data frame contain many thousands rows.
Dear All, I appreciate if you can help me out this. I have a data frame contains many thousand of rows, with some rows that has /// symbol, as shown in in row 2, I want to extract word before ///, such as in this case, CDH23. Many thanks. Probe.Set.ID Gene.Symbol 1 1552301_a_at CORO6 2 1552436_a_at CDH23 /// LOC100653137 3 1552477_a_at IRF6 4 1552685_a_at GRHL1 5 1552742_at KCNH8 6 1552752_a_at CADM2 7 1552799_at TSNARE1 8 1552897_a_at KCNG3 9 1552902_a_at FOXP2 10 1552903_at B4GALNT2 structure(list(Probe.Set.ID = c("1552301_a_at", "1552436_a_at", "1552477_a_at", "1552685_a_at", "1552742_at", "1552752_a_at", "1552799_at", "1552897_a_at", "1552902_a_at", "1552903_at"), Gene.Symbol = c("CORO6", "CDH23 /// LOC100653137", "IRF6", "GRHL1", "KCNH8", "CADM2", "TSNARE1", "KCNG3", "FOXP2", "B4GALNT2" )), .Names = c("Probe.Set.ID", "Gene.Symbol"), row.names = c(NA, 10L), class = "data.frame") Stephen HK Wong
arun
2014-Aug-01 05:28 UTC
[R] how to extract word before /// in a data frame contain many thousands rows.
Try: If dat is the dataset. ?? library(stringr) ??? res <- str_extract(dat$Gene.Symbol, perl('[[:alnum:]]+(?= \\/)')) ?res[!is.na(res)] ?#[1] "CDH23" A.K. On Thursday, July 31, 2014 9:54 PM, Stephen HK Wong <honkit at stanford.edu> wrote: Dear All, I appreciate if you can help me out this. I have a data frame contains many thousand of rows, with some rows that has /// symbol,? as shown in in row 2, I want to extract word before ///, such as in this case, CDH23. Many thanks. Probe.Set.ID? ? ? ? ? ? Gene.Symbol 1? 1552301_a_at? ? ? ? ? ? ? ? ? CORO6 2? 1552436_a_at CDH23 /// LOC100653137 3? 1552477_a_at? ? ? ? ? ? ? ? ? IRF6 4? 1552685_a_at? ? ? ? ? ? ? ? ? GRHL1 5? ? 1552742_at? ? ? ? ? ? ? ? ? KCNH8 6? 1552752_a_at? ? ? ? ? ? ? ? ? CADM2 7? ? 1552799_at? ? ? ? ? ? ? ? TSNARE1 8? 1552897_a_at? ? ? ? ? ? ? ? ? KCNG3 9? 1552902_a_at? ? ? ? ? ? ? ? ? FOXP2 10? 1552903_at? ? ? ? ? ? ? B4GALNT2 structure(list(Probe.Set.ID = c("1552301_a_at", "1552436_a_at", "1552477_a_at", "1552685_a_at", "1552742_at", "1552752_a_at", "1552799_at", "1552897_a_at", "1552902_a_at", "1552903_at"), ? ? Gene.Symbol = c("CORO6", "CDH23 /// LOC100653137", "IRF6", ? ? "GRHL1", "KCNH8", "CADM2", "TSNARE1", "KCNG3", "FOXP2", "B4GALNT2" ? ? )), .Names = c("Probe.Set.ID", "Gene.Symbol"), row.names = c(NA, 10L), class = "data.frame") Stephen HK Wong ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.