Hi everyone, I need a regular expression to find those positions in a character vector which contain something which is not a number (either positive or negative, having decimals or not). myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.") In this vector, only positions 3 and 4 are numbers, the rest should be captured. So far I am able to detect anything which is not a number, excluding - and .> grep("[^-0-9.]", myvector)[1] 1 2 I still need to capture positions 5 and 6, which in human language would mean to detect anything which contains a "-" or a "." anywhere else except at the beginning of a number. Thanks very much in advance, Adrian -- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr.90 050663 Bucharest sector 5 Romania
See if the following will work for you: grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE)> myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.") > grep('^-?[0-9]+([.][0-9]+)?$',myvector,perl=TRUE,invert=TRUE)[1] 1 2 5 6>The key is to match a number, and then invert the TRUE / FALSE (invert=TRUE). ^ == start of string -? == 0 or 1 minus signs [0-9]+ == one or more digits optionally followed by the following via use of (...)? [.] == an actual period. I tried to escape this, but it failed [0-9]+ == followed by one or more digits $ == followed by the end of the string. so: optional minus, followed by one or more digits, optionally followed by (a period with one or more ending digits). On Wed, Mar 11, 2015 at 2:27 PM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote:> Hi everyone, > > I need a regular expression to find those positions in a character > vector which contain something which is not a number (either positive > or negative, having decimals or not). > > myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.") > > In this vector, only positions 3 and 4 are numbers, the rest should be captured. > So far I am able to detect anything which is not a number, excluding - and . > >> grep("[^-0-9.]", myvector) > [1] 1 2 > > I still need to capture positions 5 and 6, which in human language > would mean to detect anything which contains a "-" or a "." anywhere > else except at the beginning of a number. > > Thanks very much in advance, > Adrian > > > -- > Adrian Dusa > University of Bucharest > Romanian Social Data Archive > Soseaua Panduri nr.90 > 050663 Bucharest sector 5 > Romania > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! <>< John McKown
Perfect, perfect, perfect. Thanks very much, John. Adrian On Wed, Mar 11, 2015 at 10:00 PM, John McKown <john.archie.mckown at gmail.com> wrote:> See if the following will work for you: > > grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE) > >> myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.") >> grep('^-?[0-9]+([.][0-9]+)?$',myvector,perl=TRUE,invert=TRUE) > [1] 1 2 5 6 >> > > The key is to match a number, and then invert the TRUE / FALSE (invert=TRUE). > ^ == start of string > -? == 0 or 1 minus signs > [0-9]+ == one or more digits > > optionally followed by the following via use of (...)? > [.] == an actual period. I tried to escape this, but it failed > [0-9]+ == followed by one or more digits > > $ == followed by the end of the string. > > so: optional minus, followed by one or more digits, optionally > followed by (a period with one or more ending digits). > > > On Wed, Mar 11, 2015 at 2:27 PM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote: >> Hi everyone, >> >> I need a regular expression to find those positions in a character >> vector which contain something which is not a number (either positive >> or negative, having decimals or not). >> >> myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.") >> >> In this vector, only positions 3 and 4 are numbers, the rest should be captured. >> So far I am able to detect anything which is not a number, excluding - and . >> >>> grep("[^-0-9.]", myvector) >> [1] 1 2 >> >> I still need to capture positions 5 and 6, which in human language >> would mean to detect anything which contains a "-" or a "." anywhere >> else except at the beginning of a number. >> >> Thanks very much in advance, >> Adrian >> >> >> -- >> Adrian Dusa >> University of Bucharest >> Romanian Social Data Archive >> Soseaua Panduri nr.90 >> 050663 Bucharest sector 5 >> Romania >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > He's about as useful as a wax frying pan. > > 10 to the 12th power microphones = 1 Megaphone > > Maranatha! <>< > John McKown-- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr.90 050663 Bucharest sector 5 Romania
How about letting a standard function decide which are numbers: which(!is.na(suppressWarnings(as.numeric(myvector)))) Also works with numbers in scientific notation and (presumably) different decimal characters, e.g. comma if that's what the locale uses. -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Adrian Du?a Sent: Thursday, 12 March 2015 8:27a To: r-help at r-project.org Subject: [R] regex find anything which is not a number Hi everyone, I need a regular expression to find those positions in a character vector which contain something which is not a number (either positive or negative, having decimals or not). myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.") In this vector, only positions 3 and 4 are numbers, the rest should be captured. So far I am able to detect anything which is not a number, excluding - and .> grep("[^-0-9.]", myvector)[1] 1 2 I still need to capture positions 5 and 6, which in human language would mean to detect anything which contains a "-" or a "." anywhere else except at the beginning of a number. Thanks very much in advance, Adrian -- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr.90 050663 Bucharest sector 5 Romania ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Thu, Mar 12, 2015 at 2:43 PM, Steve Taylor <steve.taylor at aut.ac.nz> wrote:> How about letting a standard function decide which are numbers: > > which(!is.na(suppressWarnings(as.numeric(myvector)))) > > Also works with numbers in scientific notation and (presumably) different decimal characters, e.g. comma if that's what the locale uses.One problem is that Adrian wanted, for some reason, to exclude numbers such as "2." but accept "2.0" . That is, no unnecessary trailing decimal point. as.numeric() will not fail on "2." since that is a number. The example grep() specifically excludes this by requiring at least one digit after any decimal point. -- He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! <>< John McKown