I have produced a terribly inefficient piece of codes. In the end, it gives
exactly what I need, but clumsily steps through multiple steps which I'm
sure could be more efficiently reduced.
Below is a reproducible example. What I have to begin with is character vector,
dimInfo. What I want to do is parse this vector 1) find the elements containing
'HS' and 2) grab *only* the first character after the "HS_".
The final line of code in the example gives what I need.
Any suggestions on a better approach?
Harold
dimInfo <- c("RecordID", "oppID", "position",
"key", "operational", "IsSelected",
"score", "item_1_HS_conv_ovrl_scr",
"item_1_HS_elab_ovrl_scr",
"item_1_HS_org_ovrl_scr")
ff <- dimInfo[grep('HS', dimInfo)]
gg <- strsplit(ff, 'HS_')
hh <- sapply(1:3, function(i) gg[[i]][2])
substr(hh, 1, 1)
Hello,
What about the following?
ff <- dimInfo[grep('HS', dimInfo)]
sub("^.*HS_([[:alnum:]]).*$", "\\1", ff)
Hope this helps,
Rui Barradas
Citando Doran, Harold <HDoran at air.org>:
> I have produced a terribly inefficient piece of codes. In the end,
> it gives exactly what I need, but clumsily steps through multiple
> steps which I'm sure could be more efficiently reduced.
>
> Below is a reproducible example. What I have to begin with is
> character vector, dimInfo. What I want to do is parse this vector 1)
> find the elements containing 'HS' and 2) grab *only* the first
> character after the "HS_". The final line of code in the example
> gives what I need.
>
> Any suggestions on a better approach?
>
> Harold
>
>
> dimInfo <- c("RecordID", "oppID",
"position", "key", "operational",
> "IsSelected",
> "score", "item_1_HS_conv_ovrl_scr",
"item_1_HS_elab_ovrl_scr",
> "item_1_HS_org_ovrl_scr")
>
> ff <- dimInfo[grep('HS', dimInfo)]
> gg <- strsplit(ff, 'HS_')
> hh <- sapply(1:3, function(i) gg[[i]][2])
> substr(hh, 1, 1)
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Base:
Filter(Negate(is.na), sapply(regmatches(dimInfo,
regexec("HS_(.{1})",
dimInfo)), "[", 2))
Modernverse:
library(stringi)
library(purrr)
stri_match_first_regex(dimInfo, "HS_(.{1})")[,2] %>%
discard(is.na)
They both use capture groups to find the matches and return just the
matches. The "{1}" isn't really necessary but I include to show
that you
can match whatever lengths you want, in this case just 1 char.
On Thu, Sep 15, 2016 at 12:17 PM, Doran, Harold <HDoran at air.org> wrote:
> I have produced a terribly inefficient piece of codes. In the end, it
> gives exactly what I need, but clumsily steps through multiple steps which
> I'm sure could be more efficiently reduced.
>
> Below is a reproducible example. What I have to begin with is character
> vector, dimInfo. What I want to do is parse this vector 1) find the
> elements containing 'HS' and 2) grab *only* the first character
after the
> "HS_". The final line of code in the example gives what I need.
>
> Any suggestions on a better approach?
>
> Harold
>
>
> dimInfo <- c("RecordID", "oppID",
"position", "key", "operational",
> "IsSelected",
> "score", "item_1_HS_conv_ovrl_scr",
"item_1_HS_elab_ovrl_scr",
> "item_1_HS_org_ovrl_scr")
>
> ff <- dimInfo[grep('HS', dimInfo)]
> gg <- strsplit(ff, 'HS_')
> hh <- sapply(1:3, function(i) gg[[i]][2])
> substr(hh, 1, 1)
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Thanks for the reproducible example.
Using regular expressions:
sub(".*HS_(.).*", "\\1",
dimInfo[grep("HS_",dimInfo)])
The grep() gets just the indices that contain "HS_" and the sub()
picks up the character you want from the subvector indexed by them and
replaces everything with it.
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, Sep 15, 2016 at 9:17 AM, Doran, Harold <HDoran at air.org>
wrote:> I have produced a terribly inefficient piece of codes. In the end, it gives
exactly what I need, but clumsily steps through multiple steps which I'm
sure could be more efficiently reduced.
>
> Below is a reproducible example. What I have to begin with is character
vector, dimInfo. What I want to do is parse this vector 1) find the elements
containing 'HS' and 2) grab *only* the first character after the
"HS_". The final line of code in the example gives what I need.
>
> Any suggestions on a better approach?
>
> Harold
>
>
> dimInfo <- c("RecordID", "oppID",
"position", "key", "operational",
"IsSelected",
> "score", "item_1_HS_conv_ovrl_scr",
"item_1_HS_elab_ovrl_scr",
> "item_1_HS_org_ovrl_scr")
>
> ff <- dimInfo[grep('HS', dimInfo)]
> gg <- strsplit(ff, 'HS_')
> hh <- sapply(1:3, function(i) gg[[i]][2])
> substr(hh, 1, 1)
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.