Hello List,
?
I have a dataset consisting of strings that I want to split while saving the
delimiter.
?
Some example data:
?leucocyten + gramnegatieve staven +++ grampositieve staven ++?
?leucocyten ? grampositieve coccen +?
?
I want to split the strings such that I get the following result:
c(?leucocyten +?, ??gramnegatieve staven +++?, ??grampositieve staven ++?)
c(?leucocyten ??, ?grampositieve coccen +?)
?
I have tried strsplit with a regular expression with a positive lookahead, but I
am not able to achieve the results that I want.
?
I have tried:
as.list(strsplit(x, split = ?(?=[\\+-]{1,3}\\s)+, perl=TRUE)
?
Which results in:
c(?leucocyten ?, ?+?, ??gramnegatieve staven ?, ?+?, ?+?, ?+?, ??grampositieve
staven ++?)
c(?leucocyten ?, ???, ?grampositieve coccen +?)
?
?
Is there a function or regular expression that will make this possible?
?
Kind regards,
Emily
?
This seems to do the job but there are probably more elegant solutions:
f <- function(s) { sub("^
","",unlist(strsplit(gsub("\\+ ","+@
",s),"@"))) }
g <- function(s) { sub("^
","",unlist(strsplit(gsub("- ","-@
",s),"@"))) }
h <- function(s) { g(f(s)) }
To try it out:
s <- ?leucocyten + gramnegatieve staven +++ grampositieve staven ++?
t <- ?leucocyten ? grampositieve coccen +?
h(s)
h(t)
HTH,
Eric
On Wed, Apr 12, 2023 at 7:56?PM Emily Bakker <emilybakker at outlook.com>
wrote:
> Hello List,
>
> I have a dataset consisting of strings that I want to split while saving
> the delimiter.
>
> Some example data:
> ?leucocyten + gramnegatieve staven +++ grampositieve staven ++?
> ?leucocyten ? grampositieve coccen +?
>
> I want to split the strings such that I get the following result:
> c(?leucocyten +?, ?gramnegatieve staven +++?, ?grampositieve staven ++?)
> c(?leucocyten ??, ?grampositieve coccen +?)
>
> I have tried strsplit with a regular expression with a positive lookahead,
> but I am not able to achieve the results that I want.
>
> I have tried:
> as.list(strsplit(x, split = ?(?=[\\+-]{1,3}\\s)+, perl=TRUE)
>
> Which results in:
> c(?leucocyten ?, ?+?, ?gramnegatieve staven ?, ?+?, ?+?, ?+?,
> ?grampositieve staven ++?)
> c(?leucocyten ?, ???, ?grampositieve coccen +?)
>
>
> Is there a function or regular expression that will make this possible?
>
> Kind regards,
> Emily
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
On Wed, 12 Apr 2023 08:29:50 +0000 Emily Bakker <emilybakker at outlook.com> wrote:> Some example data: > ?leucocyten + gramnegatieve staven +++ grampositieve staven ++? > ?leucocyten ? grampositieve coccen +? > ? > I want to split the strings such that I get the following result: > c(?leucocyten +?, ??gramnegatieve staven +++?, > ??grampositieve staven ++?) > c(?leucocyten ??, ?grampositieve coccen +?) > ? > I have tried strsplit with a regular expression with a positive > lookahead, but I am not able to achieve the results that I want.It sounds like you need positive look-behind, not look-ahead: split on spaces only if they _follow_ one to three of '+' or '-'. Unfortunately, repetition quantifiers like {n,m} or + are not directly supported in look-behind expressions (nor in Perl itself). As a special case, you can use \K, where anything to the left of \K is a zero-width positive match: x <- c( 'leucocyten + gramnegatieve staven +++ grampositieve staven ++', 'leucocyten - grampositieve coccen +' ) strsplit(x, '[+-]{1,3}+\\K ', perl = TRUE) # [[1]] # [1] "leucocyten +" "gramnegatieve staven +++" # "grampositieve staven ++" # # [[2]] # [1] "leucocyten -" "grampositieve coccen +" -- Best regards, Ivan P.S. It looks like your e-mail client has transformed every quote character into typographically-correct Unicode quotes ?? and every minus into an en dash, which makes it slightly harder to work with your code, since typographically correct Unicode quotes are not R string delimiters. Is it really ? that you'd like to split upon, or is it -?
I thought replacing the spaces following instances of +++,++,+,- with
"\n" and then reading with scan should succeed. Like Ivan Krylov I was
fairly sure that you meant the minus sign to be "-" rather than
"?", but perhaps your were using MS Word as an editor which is
inconsistent with effective use of R. If so, learn to use a proper programming
editor, and in any case learn to post to rhelp in plain text.
--
David
scan(text=gsub("([-+]){1}\\s", "\\1\n", dat),
what="", sep="\n")
> On Apr 12, 2023, at 2:29 AM, Emily Bakker <emilybakker at
outlook.com> wrote:
>
> Hello List,
>
> I have a dataset consisting of strings that I want to split while saving
the delimiter.
>
> Some example data:
> ?leucocyten + gramnegatieve staven +++ grampositieve staven ++?
> ?leucocyten ? grampositieve coccen +?
>
> I want to split the strings such that I get the following result:
> c(?leucocyten +?, ?gramnegatieve staven +++?, ?grampositieve staven ++?)
> c(?leucocyten ??, ?grampositieve coccen +?)
>
> I have tried strsplit with a regular expression with a positive lookahead,
but I am not able to achieve the results that I want.
>
> I have tried:
> as.list(strsplit(x, split = ?(?=[\\+-]{1,3}\\s)+, perl=TRUE)
>
> Which results in:
> c(?leucocyten ?, ?+?, ?gramnegatieve staven ?, ?+?, ?+?, ?+?,
?grampositieve staven ++?)
> c(?leucocyten ?, ???, ?grampositieve coccen +?)
>
>
> Is there a function or regular expression that will make this possible?
>
> Kind regards,
> Emily
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.