Hello everyone, I've downloaded Jeffrey Breen's R package "zipcode," which has the latitude and longitude for all of the US zip codes. So, this is a data.frame with 43,191 observations. That's one data frame in my environment. Then, I have another data.frame with over 100,000 observations that look like this: waltham, Massachusetts 02451 Columbia, SC 29209 Wheat Ridge , Colorado 80033 Charlottesville, Virginia 22902 Fairbanks, AK 99709 Montpelier, VT 05602 Dobbs Ferry, New York 10522 Henderson , Kentucky 42420 The spaces represent absences in the column. Regardless, I need to figure out how to write a code that would, presumably, match the zipcodes and produce another column to the data frame with the latitude and longitude. So, for example, the code would recognize 02451 above, and, in the the column next to it, the code would write 42.3765? N, 71.2356? W in the column next to it, since that's the latitude and longitude for Waltham, Massachusetts. Any idea of how to begin a code that would perform such an operation? Again, I have a data.frame with the zipcodes linked to the the latitudes and longitudes, on the one hand, and another data.frame with only zipcodes (and some holes). I need to produce the corresponding latitude/longitudes in the latter data.frame. Nicola
Hi Nicola, Getting the blank rows will be a bit more difficult and I don't see why they should be in the final data frame, so: townzip<-read.table(text="waltham, Massachusetts 02451 Columbia, SC 29209 Wheat Ridge , Colorado 80033 Charlottesville, Virginia 22902 Fairbanks, AK 99709 Montpelier, VT 05602 Dobbs Ferry, New York 10522 Henderson , Kentucky 42420", sep="\t",stringsAsFactors=FALSE) zip_split<-function(x) { commasplit<-unlist(strsplit(x,",")) state<-trimws(gsub("[[:digit:]]","",commasplit[2])) zip<-trimws(gsub("[[:alpha:]]","",commasplit[2])) return(c(commasplit[1],state,zip)) } townzipsplit<-as.data.frame(t(sapply(townzip$V1,zip_split))) rownames(townzipsplit)<-NULL names(townzipsplit)<-c("town","state","zip") townzipsplit$latlon<-NA # I don't know the name of the zipcode column in the "zipcode" data frame newzipdf<-merge(townzipsplit,zipcodedf,by.x="zip",by.y="zip") Jim On Tue, May 14, 2019 at 5:57 AM Nicola Ruggiero <nicola.ruggiero.unt at gmail.com> wrote:> > Hello everyone, > > I've downloaded Jeffrey Breen's R package "zipcode," which has the > latitude and longitude for all of the US zip codes. So, this is a > data.frame with 43,191 observations. That's one data frame in my > environment. > > Then, I have another data.frame with over 100,000 observations that > look like this: > > waltham, Massachusetts 02451 > Columbia, SC 29209 > > Wheat Ridge , Colorado 80033 > Charlottesville, Virginia 22902 > Fairbanks, AK 99709 > Montpelier, VT 05602 > Dobbs Ferry, New York 10522 > > Henderson , Kentucky 42420 > > The spaces represent absences in the column. Regardless, > I need to figure out how to write a code that would, presumably, match > the zipcodes and produce another column to the data frame with the > latitude and longitude. So, for example, the code would recognize > 02451 above, and, in the the column next to it, the code would write > 42.3765? N, 71.2356? W in the column next to it, since that's the > latitude and longitude for Waltham, Massachusetts. > > Any idea of how to begin a code that would perform such an operation? > > Again, I have a data.frame with the zipcodes linked to the the > latitudes and longitudes, on the one hand, and another data.frame with > only zipcodes (and some holes). I need to produce the corresponding > latitude/longitudes in the latter data.frame. > > Nicola > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Jim, I ended up collaborating with someone, and, on the basis of looking at your code (we did take it into consideration and talk about it), we came up with this: library(stringr) numextract <- function(string){ str_extract(string, "\\-*\\d+\\,*\\d*") } myDataSet$zip<-numextract(myDataSet$state) combineddata<-merge(zipcode, myDataSet, by.x="zip", by.y="zip") So, as I understand it, we build a function the purpose of which was to extract the numerical value from a string value, imputed that into a column, then merged the two data frames together. It worked! Now I just need to figure out this thing called shape data...basically I need to figure out how to interpose a shape of the United States underneath my data points so that I can see them over the location to which they correspond. Nicola On Mon, May 13, 2019 at 9:09 PM Jim Lemon <drjimlemon at gmail.com> wrote:> > Hi Nicola, > Getting the blank rows will be a bit more difficult and I don't see > why they should be in the final data frame, so: > > townzip<-read.table(text="waltham, Massachusetts 02451 > Columbia, SC 29209 > > Wheat Ridge , Colorado 80033 > Charlottesville, Virginia 22902 > Fairbanks, AK 99709 > Montpelier, VT 05602 > Dobbs Ferry, New York 10522 > > Henderson , Kentucky 42420", > sep="\t",stringsAsFactors=FALSE) > zip_split<-function(x) { > commasplit<-unlist(strsplit(x,",")) > state<-trimws(gsub("[[:digit:]]","",commasplit[2])) > zip<-trimws(gsub("[[:alpha:]]","",commasplit[2])) > return(c(commasplit[1],state,zip)) > } > townzipsplit<-as.data.frame(t(sapply(townzip$V1,zip_split))) > rownames(townzipsplit)<-NULL > names(townzipsplit)<-c("town","state","zip") > townzipsplit$latlon<-NA > # I don't know the name of the zipcode column in the "zipcode" data frame > newzipdf<-merge(townzipsplit,zipcodedf,by.x="zip",by.y="zip") > > Jim > > On Tue, May 14, 2019 at 5:57 AM Nicola Ruggiero > <nicola.ruggiero.unt at gmail.com> wrote: > > > > Hello everyone, > > > > I've downloaded Jeffrey Breen's R package "zipcode," which has the > > latitude and longitude for all of the US zip codes. So, this is a > > data.frame with 43,191 observations. That's one data frame in my > > environment. > > > > Then, I have another data.frame with over 100,000 observations that > > look like this: > > > > waltham, Massachusetts 02451 > > Columbia, SC 29209 > > > > Wheat Ridge , Colorado 80033 > > Charlottesville, Virginia 22902 > > Fairbanks, AK 99709 > > Montpelier, VT 05602 > > Dobbs Ferry, New York 10522 > > > > Henderson , Kentucky 42420 > > > > The spaces represent absences in the column. Regardless, > > I need to figure out how to write a code that would, presumably, match > > the zipcodes and produce another column to the data frame with the > > latitude and longitude. So, for example, the code would recognize > > 02451 above, and, in the the column next to it, the code would write > > 42.3765? N, 71.2356? W in the column next to it, since that's the > > latitude and longitude for Waltham, Massachusetts. > > > > Any idea of how to begin a code that would perform such an operation? > > > > Again, I have a data.frame with the zipcodes linked to the the > > latitudes and longitudes, on the one hand, and another data.frame with > > only zipcodes (and some holes). I need to produce the corresponding > > latitude/longitudes in the latter data.frame. > > > > Nicola > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.