thr3ads.net - R help - [R] regex find anything which is not a number [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Adrian Dușa

2015-Mar-11 19:27 UTC

[R] regex find anything which is not a number

Hi everyone,

I need a regular expression to find those positions in a character
vector which contain something which is not a number (either positive
or negative, having decimals or not).

myvector <- c("a3", "N.A", "1.2",
"-3", "3-2", "2.")

In this vector, only positions 3 and 4 are numbers, the rest should be captured.
So far I am able to detect anything which is not a number, excluding - and .
> grep("[^-0-9.]", myvector)[1] 1 2

I still need to capture positions 5 and 6, which in human language
would mean to detect anything which contains a "-" or a "."
anywhere
else except at the beginning of a number.

Thanks very much in advance,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

John McKown

2015-Mar-11 20:00 UTC

head link

[R] regex find anything which is not a number

See if the following will work for you:

grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
> myvector <- c("a3", "N.A", "1.2",
"-3", "3-2", "2.")
> grep('^-?[0-9]+([.][0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
[1] 1 2 5 6>
The key is to match a number, and then invert the TRUE / FALSE (invert=TRUE).
^ == start of string
-? == 0 or 1 minus signs
[0-9]+ == one or more digits

optionally followed by the following via use of (...)?
[.] == an actual period. I tried to escape this, but it failed
[0-9]+ == followed by one or more digits

$ == followed by the end of the string.

so: optional minus, followed by one or more digits, optionally
followed by (a period with one or more ending digits).


On Wed, Mar 11, 2015 at 2:27 PM, Adrian Du?a <dusa.adrian at unibuc.ro>
wrote:> Hi everyone,
>
> I need a regular expression to find those positions in a character
> vector which contain something which is not a number (either positive
> or negative, having decimals or not).
>
> myvector <- c("a3", "N.A", "1.2",
"-3", "3-2", "2.")
>
> In this vector, only positions 3 and 4 are numbers, the rest should be
captured.
> So far I am able to detect anything which is not a number, excluding - and
.
>
>> grep("[^-0-9.]", myvector)
> [1] 1 2
>
> I still need to capture positions 5 and 6, which in human language
> would mean to detect anything which contains a "-" or a
"." anywhere
> else except at the beginning of a number.
>
> Thanks very much in advance,
> Adrian
>
>
> --
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr.90
> 050663 Bucharest sector 5
> Romania
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

Adrian Dușa

2015-Mar-11 20:20 UTC

head link

[R] regex find anything which is not a number

Perfect, perfect, perfect.
Thanks very much, John.
Adrian

On Wed, Mar 11, 2015 at 10:00 PM, John McKown
<john.archie.mckown at gmail.com> wrote:> See if the following will work for you:
>
> grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
>
>> myvector <- c("a3", "N.A", "1.2",
"-3", "3-2", "2.")
>> grep('^-?[0-9]+([.][0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
> [1] 1 2 5 6
>>
>
> The key is to match a number, and then invert the TRUE / FALSE
(invert=TRUE).
> ^ == start of string
> -? == 0 or 1 minus signs
> [0-9]+ == one or more digits
>
> optionally followed by the following via use of (...)?
> [.] == an actual period. I tried to escape this, but it failed
> [0-9]+ == followed by one or more digits
>
> $ == followed by the end of the string.
>
> so: optional minus, followed by one or more digits, optionally
> followed by (a period with one or more ending digits).
>
>
> On Wed, Mar 11, 2015 at 2:27 PM, Adrian Du?a <dusa.adrian at
unibuc.ro> wrote:
>> Hi everyone,
>>
>> I need a regular expression to find those positions in a character
>> vector which contain something which is not a number (either positive
>> or negative, having decimals or not).
>>
>> myvector <- c("a3", "N.A", "1.2",
"-3", "3-2", "2.")
>>
>> In this vector, only positions 3 and 4 are numbers, the rest should be
captured.
>> So far I am able to detect anything which is not a number, excluding -
and .
>>
>>> grep("[^-0-9.]", myvector)
>> [1] 1 2
>>
>> I still need to capture positions 5 and 6, which in human language
>> would mean to detect anything which contains a "-" or a
"." anywhere
>> else except at the beginning of a number.
>>
>> Thanks very much in advance,
>> Adrian
>>
>>
>> --
>> Adrian Dusa
>> University of Bucharest
>> Romanian Social Data Archive
>> Soseaua Panduri nr.90
>> 050663 Bucharest sector 5
>> Romania
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> He's about as useful as a wax frying pan.
>
> 10 to the 12th power microphones = 1 Megaphone
>
> Maranatha! <><
> John McKown


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

Steve Taylor

2015-Mar-12 19:43 UTC

head link

[R] regex find anything which is not a number

How about letting a standard function decide which are numbers:

which(!is.na(suppressWarnings(as.numeric(myvector))))

Also works with numbers in scientific notation and (presumably) different
decimal characters, e.g. comma if that's what the locale uses.


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Adrian Du?a
Sent: Thursday, 12 March 2015 8:27a
To: r-help at r-project.org
Subject: [R] regex find anything which is not a number

Hi everyone,

I need a regular expression to find those positions in a character
vector which contain something which is not a number (either positive
or negative, having decimals or not).

myvector <- c("a3", "N.A", "1.2",
"-3", "3-2", "2.")

In this vector, only positions 3 and 4 are numbers, the rest should be captured.
So far I am able to detect anything which is not a number, excluding - and .
> grep("[^-0-9.]", myvector)[1] 1 2

I still need to capture positions 5 and 6, which in human language
would mean to detect anything which contains a "-" or a "."
anywhere
else except at the beginning of a number.

Thanks very much in advance,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

John McKown

2015-Mar-12 19:52 UTC

head link

[R] regex find anything which is not a number

On Thu, Mar 12, 2015 at 2:43 PM, Steve Taylor <steve.taylor at aut.ac.nz>
wrote:> How about letting a standard function decide which are numbers:
>
> which(!is.na(suppressWarnings(as.numeric(myvector))))
>
> Also works with numbers in scientific notation and (presumably) different
decimal characters, e.g. comma if that's what the locale uses.
One problem is that Adrian wanted, for some reason, to exclude numbers
such as "2." but accept "2.0" . That is, no unnecessary
trailing
decimal point. as.numeric() will not fail on "2." since that is a
number. The example grep() specifically excludes this by requiring at
least one digit after any decimal point.

-- 
He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

R help - Mar 2015 - regex find anything which is not a number

[R] regex find anything which is not a number

[R] regex find anything which is not a number

[R] regex find anything which is not a number

[R] regex find anything which is not a number

[R] regex find anything which is not a number