Dear R People: Here is a goofy question: I want to extract the zip code from an address and here is my work so far:> add1results.formatted_address "200 W Rosamond St, Houston, TX 77076, USA"> add1[1][32:36]<NA> <NA> <NA> <NA> <NA> NA NA NA NA NA> str(add1)Named chr "200 W Rosamond St, Houston, TX 77076, USA" - attr(*, "names")= chr "results.formatted_address">What am I not seeing, please? Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com
It's best if you make these things available to us using dput() in the future. You're probably looking for the substr() function. Since _strings_ (not characters) in R are "primitive" (Not in the primitive/internal sense: just in the primordial sense) you can't subset them with the brackets operators: what you're doing is something closer to x <- 1:5 x[30: 35] Cheers, Michael On Sun, Aug 12, 2012 at 10:33 PM, Erin Hodgess <erinm.hodgess at gmail.com> wrote:> Dear R People: > > Here is a goofy question: > > I want to extract the zip code from an address and here is my work so far: > >> add1 > results.formatted_address > "200 W Rosamond St, Houston, TX 77076, USA" >> add1[1][32:36] > <NA> <NA> <NA> <NA> <NA> > NA NA NA NA NA >> str(add1) > Named chr "200 W Rosamond St, Houston, TX 77076, USA" > - attr(*, "names")= chr "results.formatted_address" >> > > What am I not seeing, please? > > Thanks, > Erin > > > -- > Erin Hodgess > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: erinm.hodgess at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Erin, The first element of the character vector is a string. You cannot extract specifically characters from a string; try something like ?nchar or perhaps better use regular expressions to extract things between commas after two characters (or whatever logical rule accurately gets the zip code). Cheers, Josh On Sun, Aug 12, 2012 at 8:33 PM, Erin Hodgess <erinm.hodgess at gmail.com> wrote:> Dear R People: > > Here is a goofy question: > > I want to extract the zip code from an address and here is my work so far: > >> add1 > results.formatted_address > "200 W Rosamond St, Houston, TX 77076, USA" >> add1[1][32:36] > <NA> <NA> <NA> <NA> <NA> > NA NA NA NA NA >> str(add1) > Named chr "200 W Rosamond St, Houston, TX 77076, USA" > - attr(*, "names")= chr "results.formatted_address" >> > > What am I not seeing, please? > > Thanks, > Erin > > > -- > Erin Hodgess > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: erinm.hodgess at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
HI, Try this: add11<-strsplit(add1,split=",") ?gsub("TX","",add11[[1]][3]) #[1] "? 77076" A.K. ----- Original Message ----- From: Erin Hodgess <erinm.hodgess at gmail.com> To: R help <r-help at stat.math.ethz.ch> Cc: Sent: Sunday, August 12, 2012 11:33 PM Subject: [R] named character question Dear R People: Here is a goofy question: I want to extract the zip code from an address and here is my work so far:> add1? ? ? ? ? ? ? ? ? results.formatted_address "200 W Rosamond St, Houston, TX 77076, USA"> add1[1][32:36]<NA> <NA> <NA> <NA> <NA> ? NA? NA? NA? NA? NA> str(add1)Named chr "200 W Rosamond St, Houston, TX 77076, USA" - attr(*, "names")= chr "results.formatted_address">What am I not seeing, please? Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Aug 12, 2012, at 8:33 PM, Erin Hodgess wrote:> Dear R People: > > Here is a goofy question: > > I want to extract the zip code from an address and here is my work > so far: > >> add1 > results.formatted_address > "200 W Rosamond St, Houston, TX 77076, USA" >> add1[1][32:36] > <NA> <NA> <NA> <NA> <NA> > NA NA NA NA NA >> str(add1) > Named chr "200 W Rosamond St, Houston, TX 77076, USA" > - attr(*, "names")= chr "results.formatted_address"> ttt <- "200 W Rosamond St, Houston, TX 77076, USA" > sub("^.+,.+,\\s[[:alpha:]]*\\s([[:digit:]]{5}).+", "\\1", ttt) [1] "77076" You will need to determine if all you addresses have two commas before the two letter state designation. You may not need as specific a pattern as this. An alternate pattern. > sub("^.+\\s[[:alpha:]]{2}\\s([[:digit:]]{5}).+", "\\1", ttt) [1] "77076" -- David Winsemius, MD Alameda, CA, USA
You are treating add1 as a vector of characters. If you want the zipcode and you know what positions it is within the string use substr(add1[1], 32, 36) If you don't know, you could use (but it will get any 5 digit number): regmatches(add1, regexpr("[[:digit:]]{5}", add1)) ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Erin Hodgess > Sent: Sunday, August 12, 2012 10:34 PM > To: R help > Subject: [R] named character question > > Dear R People: > > Here is a goofy question: > > I want to extract the zip code from an address and here is my work so > far: > > > add1 > results.formatted_address > "200 W Rosamond St, Houston, TX 77076, USA" > > add1[1][32:36] > <NA> <NA> <NA> <NA> <NA> > NA NA NA NA NA > > str(add1) > Named chr "200 W Rosamond St, Houston, TX 77076, USA" > - attr(*, "names")= chr "results.formatted_address" > > > > What am I not seeing, please? > > Thanks, > Erin > > > -- > Erin Hodgess > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: erinm.hodgess at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
HI, One more method to extract the code: add1<-"200 W Rosamond St, Houston, TX 77076, USA" ?sub(".*\\s.*\\s.*\\s.*\\s.*\\s.*\\s([[:digit:]]{5}).*","\\1",add1) #[1] "77076" #or, ?sub(".*\\s+([[:digit:]]{5}).*","\\1",ttt) #[1] "77076" A.K. ----- Original Message ----- From: Erin Hodgess <erinm.hodgess at gmail.com> To: R help <r-help at stat.math.ethz.ch> Cc: Sent: Sunday, August 12, 2012 11:33 PM Subject: [R] named character question Dear R People: Here is a goofy question: I want to extract the zip code from an address and here is my work so far:> add1? ? ? ? ? ? ? ? ? results.formatted_address "200 W Rosamond St, Houston, TX 77076, USA"> add1[1][32:36]<NA> <NA> <NA> <NA> <NA> ? NA? NA? NA? NA? NA> str(add1)Named chr "200 W Rosamond St, Houston, TX 77076, USA" - attr(*, "names")= chr "results.formatted_address">What am I not seeing, please? Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.