Hi everyone,
I need a regular expression to find those positions in a character
vector which contain something which is not a number (either positive
or negative, having decimals or not).
myvector <- c("a3", "N.A", "1.2",
"-3", "3-2", "2.")
In this vector, only positions 3 and 4 are numbers, the rest should be captured.
So far I am able to detect anything which is not a number, excluding - and .
> grep("[^-0-9.]", myvector)
[1] 1 2
I still need to capture positions 5 and 6, which in human language
would mean to detect anything which contains a "-" or a "."
anywhere
else except at the beginning of a number.
Thanks very much in advance,
Adrian
--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania
See if the following will work for you:
grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
> myvector <- c("a3", "N.A", "1.2",
"-3", "3-2", "2.")
> grep('^-?[0-9]+([.][0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
[1] 1 2 5 6>
The key is to match a number, and then invert the TRUE / FALSE (invert=TRUE).
^ == start of string
-? == 0 or 1 minus signs
[0-9]+ == one or more digits
optionally followed by the following via use of (...)?
[.] == an actual period. I tried to escape this, but it failed
[0-9]+ == followed by one or more digits
$ == followed by the end of the string.
so: optional minus, followed by one or more digits, optionally
followed by (a period with one or more ending digits).
On Wed, Mar 11, 2015 at 2:27 PM, Adrian Du?a <dusa.adrian at unibuc.ro>
wrote:> Hi everyone,
>
> I need a regular expression to find those positions in a character
> vector which contain something which is not a number (either positive
> or negative, having decimals or not).
>
> myvector <- c("a3", "N.A", "1.2",
"-3", "3-2", "2.")
>
> In this vector, only positions 3 and 4 are numbers, the rest should be
captured.
> So far I am able to detect anything which is not a number, excluding - and
.
>
>> grep("[^-0-9.]", myvector)
> [1] 1 2
>
> I still need to capture positions 5 and 6, which in human language
> would mean to detect anything which contains a "-" or a
"." anywhere
> else except at the beginning of a number.
>
> Thanks very much in advance,
> Adrian
>
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr.90
> 050663 Bucharest sector 5
> Romania
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
He's about as useful as a wax frying pan.
10 to the 12th power microphones = 1 Megaphone
Maranatha! <><
John McKown
Perfect, perfect, perfect. Thanks very much, John. Adrian On Wed, Mar 11, 2015 at 10:00 PM, John McKown <john.archie.mckown at gmail.com> wrote:> See if the following will work for you: > > grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE) > >> myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.") >> grep('^-?[0-9]+([.][0-9]+)?$',myvector,perl=TRUE,invert=TRUE) > [1] 1 2 5 6 >> > > The key is to match a number, and then invert the TRUE / FALSE (invert=TRUE). > ^ == start of string > -? == 0 or 1 minus signs > [0-9]+ == one or more digits > > optionally followed by the following via use of (...)? > [.] == an actual period. I tried to escape this, but it failed > [0-9]+ == followed by one or more digits > > $ == followed by the end of the string. > > so: optional minus, followed by one or more digits, optionally > followed by (a period with one or more ending digits). > > > On Wed, Mar 11, 2015 at 2:27 PM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: >> Hi everyone, >> >> I need a regular expression to find those positions in a character >> vector which contain something which is not a number (either positive >> or negative, having decimals or not). >> >> myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.") >> >> In this vector, only positions 3 and 4 are numbers, the rest should be captured. >> So far I am able to detect anything which is not a number, excluding - and . >> >>> grep("[^-0-9.]", myvector) >> [1] 1 2 >> >> I still need to capture positions 5 and 6, which in human language >> would mean to detect anything which contains a "-" or a "." anywhere >> else except at the beginning of a number. >> >> Thanks very much in advance, >> Adrian >> >> >> -- >> Adrian Dusa >> University of Bucharest >> Romanian Social Data Archive >> Soseaua Panduri nr.90 >> 050663 Bucharest sector 5 >> Romania >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > He's about as useful as a wax frying pan. > > 10 to the 12th power microphones = 1 Megaphone > > Maranatha! <>< > John McKown-- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr.90 050663 Bucharest sector 5 Romania
How about letting a standard function decide which are numbers:
which(!is.na(suppressWarnings(as.numeric(myvector))))
Also works with numbers in scientific notation and (presumably) different
decimal characters, e.g. comma if that's what the locale uses.
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Adrian Du?a
Sent: Thursday, 12 March 2015 8:27a
To: r-help at r-project.org
Subject: [R] regex find anything which is not a number
Hi everyone,
I need a regular expression to find those positions in a character
vector which contain something which is not a number (either positive
or negative, having decimals or not).
myvector <- c("a3", "N.A", "1.2",
"-3", "3-2", "2.")
In this vector, only positions 3 and 4 are numbers, the rest should be captured.
So far I am able to detect anything which is not a number, excluding - and .
> grep("[^-0-9.]", myvector)
[1] 1 2
I still need to capture positions 5 and 6, which in human language
would mean to detect anything which contains a "-" or a "."
anywhere
else except at the beginning of a number.
Thanks very much in advance,
Adrian
--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
On Thu, Mar 12, 2015 at 2:43 PM, Steve Taylor <steve.taylor at aut.ac.nz> wrote:> How about letting a standard function decide which are numbers: > > which(!is.na(suppressWarnings(as.numeric(myvector)))) > > Also works with numbers in scientific notation and (presumably) different decimal characters, e.g. comma if that's what the locale uses.One problem is that Adrian wanted, for some reason, to exclude numbers such as "2." but accept "2.0" . That is, no unnecessary trailing decimal point. as.numeric() will not fail on "2." since that is a number. The example grep() specifically excludes this by requiring at least one digit after any decimal point. -- He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! <>< John McKown