thr3ads.net - R help - [R] Better use of regex [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Doran, Harold

2016-Sep-15 16:17 UTC

[R] Better use of regex

I have produced a terribly inefficient piece of codes. In the end, it gives
exactly what I need, but clumsily steps through multiple steps which I'm
sure could be more efficiently reduced.

Below is a reproducible example. What I have to begin with is character vector,
dimInfo. What I want to do is parse this vector 1) find the elements containing
'HS' and 2) grab *only* the first character after the "HS_".
The final line of code in the example gives what I need.

Any suggestions on a better approach?

Harold


dimInfo <- c("RecordID", "oppID", "position",
"key", "operational", "IsSelected",
"score", "item_1_HS_conv_ovrl_scr",
"item_1_HS_elab_ovrl_scr",
"item_1_HS_org_ovrl_scr")

ff <- dimInfo[grep('HS', dimInfo)]
gg <- strsplit(ff, 'HS_')
hh <- sapply(1:3, function(i) gg[[i]][2])
substr(hh, 1, 1)

ruipbarradas at sapo.pt

2016-Sep-15 16:35 UTC

head link

[R] Better use of regex

Hello,

What about the following?

ff <- dimInfo[grep('HS', dimInfo)]
sub("^.*HS_([[:alnum:]]).*$", "\\1", ff)


Hope this helps,

Rui Barradas


Citando Doran, Harold <HDoran at air.org>:
> I have produced a terribly inefficient piece of codes. In the end,  
> it gives exactly what I need, but clumsily steps through multiple  
> steps which I'm sure could be more efficiently reduced.
>
> Below is a reproducible example. What I have to begin with is  
> character vector, dimInfo. What I want to do is parse this vector 1)  
> find the elements containing 'HS' and 2) grab *only* the first  
> character after the "HS_". The final line of code in the example
> gives what I need.
>
> Any suggestions on a better approach?
>
> Harold
>
>
> dimInfo <- c("RecordID", "oppID",
"position", "key", "operational",
> "IsSelected",
> "score", "item_1_HS_conv_ovrl_scr",
"item_1_HS_elab_ovrl_scr",
> "item_1_HS_org_ovrl_scr")
>
> ff <- dimInfo[grep('HS', dimInfo)]
> gg <- strsplit(ff, 'HS_')
> hh <- sapply(1:3, function(i) gg[[i]][2])
> substr(hh, 1, 1)
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bob Rudis

2016-Sep-15 16:38 UTC

head link

[R] Better use of regex

Base:

    Filter(Negate(is.na), sapply(regmatches(dimInfo,
regexec("HS_(.{1})",
dimInfo)), "[", 2))

Modernverse:

    library(stringi)
    library(purrr)

    stri_match_first_regex(dimInfo, "HS_(.{1})")[,2] %>%
      discard(is.na)


They both use capture groups to find the matches and return just the
matches. The "{1}" isn't really necessary but I include to show
that you
can match whatever lengths you want, in this case just 1 char.

On Thu, Sep 15, 2016 at 12:17 PM, Doran, Harold <HDoran at air.org> wrote:
> I have produced a terribly inefficient piece of codes. In the end, it
> gives exactly what I need, but clumsily steps through multiple steps which
> I'm sure could be more efficiently reduced.
>
> Below is a reproducible example. What I have to begin with is character
> vector, dimInfo. What I want to do is parse this vector 1) find the
> elements containing 'HS' and 2) grab *only* the first character
after the
> "HS_". The final line of code in the example gives what I need.
>
> Any suggestions on a better approach?
>
> Harold
>
>
> dimInfo <- c("RecordID", "oppID",
"position", "key", "operational",
> "IsSelected",
> "score", "item_1_HS_conv_ovrl_scr",
"item_1_HS_elab_ovrl_scr",
> "item_1_HS_org_ovrl_scr")
>
> ff <- dimInfo[grep('HS', dimInfo)]
> gg <- strsplit(ff, 'HS_')
> hh <- sapply(1:3, function(i) gg[[i]][2])
> substr(hh, 1, 1)
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Bert Gunter

2016-Sep-15 16:47 UTC

head link

[R] Better use of regex

Thanks for the reproducible example.

Using regular expressions:

sub(".*HS_(.).*", "\\1",
dimInfo[grep("HS_",dimInfo)])

The grep() gets just the indices that contain "HS_" and the sub()
picks up the character you want from the subvector indexed by them and
replaces everything with it.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Sep 15, 2016 at 9:17 AM, Doran, Harold <HDoran at air.org>
wrote:> I have produced a terribly inefficient piece of codes. In the end, it gives
exactly what I need, but clumsily steps through multiple steps which I'm
sure could be more efficiently reduced.
>
> Below is a reproducible example. What I have to begin with is character
vector, dimInfo. What I want to do is parse this vector 1) find the elements
containing 'HS' and 2) grab *only* the first character after the
"HS_". The final line of code in the example gives what I need.
>
> Any suggestions on a better approach?
>
> Harold
>
>
> dimInfo <- c("RecordID", "oppID",
"position", "key", "operational",
"IsSelected",
> "score", "item_1_HS_conv_ovrl_scr",
"item_1_HS_elab_ovrl_scr",
> "item_1_HS_org_ovrl_scr")
>
> ff <- dimInfo[grep('HS', dimInfo)]
> gg <- strsplit(ff, 'HS_')
> hh <- sapply(1:3, function(i) gg[[i]][2])
> substr(hh, 1, 1)
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Sep 2016 - Better use of regex

[R] Better use of regex

[R] Better use of regex

[R] Better use of regex

[R] Better use of regex