thr3ads.net - R help - [R] Split String in regex while Keeping Delimiter [Apr 2023]

If this information is useful, please help other people find it:
Share via:

Emily Bakker

2023-Apr-12 08:29 UTC

[R] Split String in regex while Keeping Delimiter

Hello List,
?
I have a dataset consisting of strings that I want to split while saving the
delimiter.
?
Some example data:
?leucocyten + gramnegatieve staven +++ grampositieve staven ++?
?leucocyten ? grampositieve coccen +?
?
I want to split the strings such that I get the following result:
c(?leucocyten +?, ??gramnegatieve staven +++?, ??grampositieve staven ++?)
c(?leucocyten ??, ?grampositieve coccen +?)
?
I have tried strsplit with a regular expression with a positive lookahead, but I
am not able to achieve the results that I want.
?
I have tried:
as.list(strsplit(x, split = ?(?=[\\+-]{1,3}\\s)+, perl=TRUE)
?
Which results in:
c(?leucocyten ?, ?+?, ??gramnegatieve staven ?, ?+?, ?+?, ?+?, ??grampositieve
staven ++?)
c(?leucocyten ?, ???, ?grampositieve coccen +?)
?
?
Is there a function or regular expression that will make this possible?
?
Kind regards,
Emily 
?

Eric Berger

2023-Apr-12 17:32 UTC

head link

[R] Split String in regex while Keeping Delimiter

This seems to do the job but there are probably more elegant solutions:

f <- function(s) { sub("^
","",unlist(strsplit(gsub("\\+ ","+@
",s),"@"))) }
g <- function(s) { sub("^
","",unlist(strsplit(gsub("- ","-@
",s),"@"))) }
h <- function(s) { g(f(s)) }

To try it out:
s <- ?leucocyten + gramnegatieve staven +++ grampositieve staven ++?
t <- ?leucocyten ? grampositieve coccen +?

h(s)
h(t)

HTH,
Eric


On Wed, Apr 12, 2023 at 7:56?PM Emily Bakker <emilybakker at outlook.com>
wrote:
> Hello List,
>
> I have a dataset consisting of strings that I want to split while saving
> the delimiter.
>
> Some example data:
> ?leucocyten + gramnegatieve staven +++ grampositieve staven ++?
> ?leucocyten ? grampositieve coccen +?
>
> I want to split the strings such that I get the following result:
> c(?leucocyten +?,  ?gramnegatieve staven +++?,  ?grampositieve staven ++?)
> c(?leucocyten ??, ?grampositieve coccen +?)
>
> I have tried strsplit with a regular expression with a positive lookahead,
> but I am not able to achieve the results that I want.
>
> I have tried:
> as.list(strsplit(x, split = ?(?=[\\+-]{1,3}\\s)+, perl=TRUE)
>
> Which results in:
> c(?leucocyten ?, ?+?,  ?gramnegatieve staven ?, ?+?, ?+?, ?+?,
>  ?grampositieve staven ++?)
> c(?leucocyten ?, ???, ?grampositieve coccen +?)
>
>
> Is there a function or regular expression that will make this possible?
>
> Kind regards,
> Emily
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Ivan Krylov

2023-Apr-12 17:47 UTC

head link

[R] Split String in regex while Keeping Delimiter

On Wed, 12 Apr 2023 08:29:50 +0000
Emily Bakker <emilybakker at outlook.com> wrote:
> Some example data:
> ?leucocyten + gramnegatieve staven +++ grampositieve staven ++?
> ?leucocyten ? grampositieve coccen +?
> ?
> I want to split the strings such that I get the following result:
> c(?leucocyten +?, ??gramnegatieve staven +++?,
> ??grampositieve staven ++?)
> c(?leucocyten ??, ?grampositieve coccen +?)
> ?
> I have tried strsplit with a regular expression with a positive
> lookahead, but I am not able to achieve the results that I want.
It sounds like you need positive look-behind, not look-ahead: split on
spaces only if they _follow_ one to three of '+' or '-'.
Unfortunately,
repetition quantifiers like {n,m} or + are not directly supported in
look-behind expressions (nor in Perl itself). As a special case, you
can use \K, where anything to the left of \K is a zero-width positive
match:

x <- c(
 'leucocyten + gramnegatieve staven +++ grampositieve staven ++',
 'leucocyten - grampositieve coccen +'
)
strsplit(x, '[+-]{1,3}+\\K ', perl = TRUE)
# [[1]]
# [1] "leucocyten +"             "gramnegatieve staven +++"
#     "grampositieve staven ++" 
# 
# [[2]]
# [1] "leucocyten -"           "grampositieve coccen +"

-- 
Best regards,
Ivan

P.S. It looks like your e-mail client has transformed every quote
character into typographically-correct Unicode quotes ?? and every
minus into an en dash, which makes it slightly harder to work with your
code, since typographically correct Unicode quotes are not R string
delimiters. Is it really ? that you'd like to split upon, or is it -?

David Winsemius

2023-Apr-12 22:03 UTC

head link

[R] Split String in regex while Keeping Delimiter

I thought replacing the spaces following instances of +++,++,+,- with
"\n" and then reading with scan should succeed. Like Ivan Krylov I was
fairly sure that you meant the minus sign to be "-" rather than
"?", but perhaps your were using MS Word as an editor which is
inconsistent with effective use of R. If so, learn to use a proper programming
editor, and in any case learn to post to rhelp in plain text.

-- 
David

scan(text=gsub("([-+]){1}\\s", "\\1\n", dat),
what="", sep="\n")


> On Apr 12, 2023, at 2:29 AM, Emily Bakker <emilybakker at
outlook.com> wrote:
> 
> Hello List,
>  
> I have a dataset consisting of strings that I want to split while saving
the delimiter.
>  
> Some example data:
> ?leucocyten + gramnegatieve staven +++ grampositieve staven ++?
> ?leucocyten ? grampositieve coccen +?
>  
> I want to split the strings such that I get the following result:
> c(?leucocyten +?,  ?gramnegatieve staven +++?,  ?grampositieve staven ++?)
> c(?leucocyten ??, ?grampositieve coccen +?)
>  
> I have tried strsplit with a regular expression with a positive lookahead,
but I am not able to achieve the results that I want.
>  
> I have tried:
> as.list(strsplit(x, split = ?(?=[\\+-]{1,3}\\s)+, perl=TRUE)
>  
> Which results in:
> c(?leucocyten ?, ?+?,  ?gramnegatieve staven ?, ?+?, ?+?, ?+?, 
?grampositieve staven ++?)
> c(?leucocyten ?, ???, ?grampositieve coccen +?)
>  
>  
> Is there a function or regular expression that will make this possible?
>  
> Kind regards,
> Emily 
>  
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Apr 2023 - Split String in regex while Keeping Delimiter

[R] Split String in regex while Keeping Delimiter

[R] Split String in regex while Keeping Delimiter

[R] Split String in regex while Keeping Delimiter

[R] Split String in regex while Keeping Delimiter

Seemingly Similar Threads