I have produced a terribly inefficient piece of codes. In the end, it gives exactly what I need, but clumsily steps through multiple steps which I'm sure could be more efficiently reduced. Below is a reproducible example. What I have to begin with is character vector, dimInfo. What I want to do is parse this vector 1) find the elements containing 'HS' and 2) grab *only* the first character after the "HS_". The final line of code in the example gives what I need. Any suggestions on a better approach? Harold dimInfo <- c("RecordID", "oppID", "position", "key", "operational", "IsSelected", "score", "item_1_HS_conv_ovrl_scr", "item_1_HS_elab_ovrl_scr", "item_1_HS_org_ovrl_scr") ff <- dimInfo[grep('HS', dimInfo)] gg <- strsplit(ff, 'HS_') hh <- sapply(1:3, function(i) gg[[i]][2]) substr(hh, 1, 1)
Hello, What about the following? ff <- dimInfo[grep('HS', dimInfo)] sub("^.*HS_([[:alnum:]]).*$", "\\1", ff) Hope this helps, Rui Barradas Citando Doran, Harold <HDoran at air.org>:> I have produced a terribly inefficient piece of codes. In the end, > it gives exactly what I need, but clumsily steps through multiple > steps which I'm sure could be more efficiently reduced. > > Below is a reproducible example. What I have to begin with is > character vector, dimInfo. What I want to do is parse this vector 1) > find the elements containing 'HS' and 2) grab *only* the first > character after the "HS_". The final line of code in the example > gives what I need. > > Any suggestions on a better approach? > > Harold > > > dimInfo <- c("RecordID", "oppID", "position", "key", "operational", > "IsSelected", > "score", "item_1_HS_conv_ovrl_scr", "item_1_HS_elab_ovrl_scr", > "item_1_HS_org_ovrl_scr") > > ff <- dimInfo[grep('HS', dimInfo)] > gg <- strsplit(ff, 'HS_') > hh <- sapply(1:3, function(i) gg[[i]][2]) > substr(hh, 1, 1) > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Base: Filter(Negate(is.na), sapply(regmatches(dimInfo, regexec("HS_(.{1})", dimInfo)), "[", 2)) Modernverse: library(stringi) library(purrr) stri_match_first_regex(dimInfo, "HS_(.{1})")[,2] %>% discard(is.na) They both use capture groups to find the matches and return just the matches. The "{1}" isn't really necessary but I include to show that you can match whatever lengths you want, in this case just 1 char. On Thu, Sep 15, 2016 at 12:17 PM, Doran, Harold <HDoran at air.org> wrote:> I have produced a terribly inefficient piece of codes. In the end, it > gives exactly what I need, but clumsily steps through multiple steps which > I'm sure could be more efficiently reduced. > > Below is a reproducible example. What I have to begin with is character > vector, dimInfo. What I want to do is parse this vector 1) find the > elements containing 'HS' and 2) grab *only* the first character after the > "HS_". The final line of code in the example gives what I need. > > Any suggestions on a better approach? > > Harold > > > dimInfo <- c("RecordID", "oppID", "position", "key", "operational", > "IsSelected", > "score", "item_1_HS_conv_ovrl_scr", "item_1_HS_elab_ovrl_scr", > "item_1_HS_org_ovrl_scr") > > ff <- dimInfo[grep('HS', dimInfo)] > gg <- strsplit(ff, 'HS_') > hh <- sapply(1:3, function(i) gg[[i]][2]) > substr(hh, 1, 1) > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thanks for the reproducible example. Using regular expressions: sub(".*HS_(.).*", "\\1", dimInfo[grep("HS_",dimInfo)]) The grep() gets just the indices that contain "HS_" and the sub() picks up the character you want from the subvector indexed by them and replaces everything with it. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Sep 15, 2016 at 9:17 AM, Doran, Harold <HDoran at air.org> wrote:> I have produced a terribly inefficient piece of codes. In the end, it gives exactly what I need, but clumsily steps through multiple steps which I'm sure could be more efficiently reduced. > > Below is a reproducible example. What I have to begin with is character vector, dimInfo. What I want to do is parse this vector 1) find the elements containing 'HS' and 2) grab *only* the first character after the "HS_". The final line of code in the example gives what I need. > > Any suggestions on a better approach? > > Harold > > > dimInfo <- c("RecordID", "oppID", "position", "key", "operational", "IsSelected", > "score", "item_1_HS_conv_ovrl_scr", "item_1_HS_elab_ovrl_scr", > "item_1_HS_org_ovrl_scr") > > ff <- dimInfo[grep('HS', dimInfo)] > gg <- strsplit(ff, 'HS_') > hh <- sapply(1:3, function(i) gg[[i]][2]) > substr(hh, 1, 1) > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.