Omar André Gonzáles Díaz
2015-Oct-08 22:45 UTC
[R] regex - extracting 2 numbers and " from strings
Hi I have a vector of 100 elementos like this ones: a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58'' LE58D3140") I want to put just the (70\") and (58'') in a vector b. This is my try, but is not working: b <- grepl('^[0-9]{2}""$',a) Any hint is welcome, thanks. [[alternative HTML version deleted]]
On Thu, Oct 08, 2015 at 05:45:13PM -0500, Omar Andr? Gonz?les D?az wrote:> Hi I have a vector of 100 elementos like this ones: > > a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58'' LE58D3140") > > I want to put just the (70\") and (58'') in a vector b. > > This is my try, but is not working: > > b <- grepl('^[0-9]{2}""$',a) > > Any hint is welcome, thanks. > ...Perhaps:> a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58''LE58D3140")> b <- sub('^.* ([0-9]{2}(\'\'|")) .*$', "\\1", a) > b[1] "70\"" "58''">Peace, david -- David H. Wolfskill r at catwhisker.org Those who would murder in the name of God or prophet are blasphemous cowards. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 949 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20151008/69642c95/attachment.bin>
On Oct 8, 2015, at 3:45 PM, Omar Andr? Gonz?les D?az wrote:> Hi I have a vector of 100 elementos like this ones: > > a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58'' LE58D3140") > > I want to put just the (70\") and (58'') in a vector b.> sub("(^.+ )(\\d+)([\"]|[']{2})(.+$)", "\\2\\3", a)[1] "70\"" "58''" Also. The `stringr` package uses the code in the `stringi` package to give more compact expressions. You might want to look at str_extract Extract matching patterns from a string. str_extract_all Extract matching patterns from a string.> > This is my try, but is not working: > > b <- grepl('^[0-9]{2}""$',a) > > Any hint is welcome, thanks. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
On Oct 8, 2015, at 4:50 PM, Omar Andr? Gonz?les D?az wrote:> David, it does work but not in all cases:It should work if you change the "+" to "*" in the last capture class. It makes trailing non-digit characters entirely optional.> sub("(^.+ )(\\d+)([\"]|[']{2})(.*$)", "\\2\\3", b)[1] "40''" "40''" "49\"" "49\"" "28\"" "40\"" "32''" "32''" "40\"" "55\"" [11] "40\"" "24\"" "42''" "50\"" "48\"" "48\"" "48\"" "48''" "50\"" "50''" [21] "50\"" "55\"" "55''" "55\"" "55''" "55\"" "65''" "65\"" "65''" "75\"" Moral of the story: Always post an example with the necessary complexity.> > This is now my b vector, after your solution: > > b <- c("40''", "40''", "49\"", "49\"", "HAIER TELEVISOR LED LE28F6600 28\"", > "40\"", "32''", "32''", "40\"", "55\"", "HAIER TV LED LE40B8000 FULL HD 40\"", > "24\"", "42''", "HAIER TELEVISOR LED LE50K5000N 50\"", "48\"", > "48\"", "48\"", "48''", "50\"", "50''", "50\"", "55\"", "55''", > "55\"", "55''", "55\"", "65''", "SAMSUNG SMART TV 65JU6500 LED UHD 65\"", > "65''", "75\"") > > 2015-10-08 18:14 GMT-05:00 David Winsemius <dwinsemius at comcast.net>: > > On Oct 8, 2015, at 3:45 PM, Omar Andr? Gonz?les D?az wrote: > > > Hi I have a vector of 100 elementos like this ones: > > > > a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58'' LE58D3140") > > > > I want to put just the (70\") and (58'') in a vector b. > > > sub("(^.+ )(\\d+)([\"]|[']{2})(.+$)", "\\2\\3", a) > [1] "70\"" "58''" > > Also. The `stringr` package uses the code in the `stringi` package to give more compact expressions. You might want to look at > > str_extract Extract matching patterns from a string. > str_extract_all Extract matching patterns from a string. > > > > > > This is my try, but is not working: > > > > b <- grepl('^[0-9]{2}""$',a) > > > > Any hint is welcome, thanks. > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > >David Winsemius Alameda, CA, USA
Omar André Gonzáles Díaz
2015-Oct-09 18:53 UTC
[R] regex - extracting 2 numbers and " from strings
Yes, you are right. Thank you. 2015-10-08 20:07 GMT-05:00 David Winsemius <dwinsemius at comcast.net>:> > On Oct 8, 2015, at 4:50 PM, Omar Andr? Gonz?les D?az wrote: > > > David, it does work but not in all cases: > > It should work if you change the "+" to "*" in the last capture class. It > makes trailing non-digit characters entirely optional. > > > sub("(^.+ )(\\d+)([\"]|[']{2})(.*$)", "\\2\\3", b) > [1] "40''" "40''" "49\"" "49\"" "28\"" "40\"" "32''" "32''" "40\"" "55\"" > [11] "40\"" "24\"" "42''" "50\"" "48\"" "48\"" "48\"" "48''" "50\"" "50''" > [21] "50\"" "55\"" "55''" "55\"" "55''" "55\"" "65''" "65\"" "65''" "75\"" > > > Moral of the story: Always post an example with the necessary complexity. > > > > This is now my b vector, after your solution: > > > > b <- c("40''", "40''", "49\"", "49\"", "HAIER TELEVISOR LED LE28F6600 > 28\"", > > "40\"", "32''", "32''", "40\"", "55\"", "HAIER TV LED LE40B8000 FULL HD > 40\"", > > "24\"", "42''", "HAIER TELEVISOR LED LE50K5000N 50\"", "48\"", > > "48\"", "48\"", "48''", "50\"", "50''", "50\"", "55\"", "55''", > > "55\"", "55''", "55\"", "65''", "SAMSUNG SMART TV 65JU6500 LED UHD 65\"", > > "65''", "75\"") > > > > 2015-10-08 18:14 GMT-05:00 David Winsemius <dwinsemius at comcast.net>: > > > > On Oct 8, 2015, at 3:45 PM, Omar Andr? Gonz?les D?az wrote: > > > > > Hi I have a vector of 100 elementos like this ones: > > > > > > a <- c("SMART TV LCD FHD 70\" LC70LE660", "LED FULL HD 58'' LE58D3140") > > > > > > I want to put just the (70\") and (58'') in a vector b. > > > > > sub("(^.+ )(\\d+)([\"]|[']{2})(.+$)", "\\2\\3", a) > > [1] "70\"" "58''" > > > > Also. The `stringr` package uses the code in the `stringi` package to > give more compact expressions. You might want to look at > > > > str_extract Extract matching patterns from a string. > > str_extract_all Extract matching patterns from a string. > > > > > > > > > > This is my try, but is not working: > > > > > > b <- grepl('^[0-9]{2}""$',a) > > > > > > Any hint is welcome, thanks. > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > David Winsemius > > Alameda, CA, USA > > > > > > David Winsemius > Alameda, CA, USA > >[[alternative HTML version deleted]]