Hello everyone, I've downloaded Jeffrey Breen's R package "zipcode," which has the latitude and longitude for all of the US zip codes. So, this is a data.frame with 43,191 observations. That's one data frame in my environment. Then, I have another data.frame with over 100,000 observations that look like this: waltham, Massachusetts 02451 Columbia, SC 29209 Wheat Ridge , Colorado 80033 Charlottesville, Virginia 22902 Fairbanks, AK 99709 Montpelier, VT 05602 Dobbs Ferry, New York 10522 Henderson , Kentucky 42420 The spaces represent absences in the column. Regardless, I need to figure out how to write a code that would, presumably, match the zipcodes and produce another column to the data frame with the latitude and longitude. So, for example, the code would recognize 02451 above, and, in the the column next to it, the code would write 42.3765? N, 71.2356? W in the column next to it, since that's the latitude and longitude for Waltham, Massachusetts. Any idea of how to begin a code that would perform such an operation? Again, I have a data.frame with the zipcodes linked to the the latitudes and longitudes, on the one hand, and another data.frame with only zipcodes (and some holes). I need to produce the corresponding latitude/longitudes in the latter data.frame. Nicola
Hi Nicola,
Getting the blank rows will be a bit more difficult and I don't see
why they should be in the final data frame, so:
townzip<-read.table(text="waltham, Massachusetts 02451
Columbia, SC 29209
Wheat Ridge , Colorado 80033
Charlottesville, Virginia 22902
Fairbanks, AK 99709
Montpelier, VT 05602
Dobbs Ferry, New York 10522
Henderson , Kentucky 42420",
sep="\t",stringsAsFactors=FALSE)
zip_split<-function(x) {
commasplit<-unlist(strsplit(x,","))
state<-trimws(gsub("[[:digit:]]","",commasplit[2]))
zip<-trimws(gsub("[[:alpha:]]","",commasplit[2]))
return(c(commasplit[1],state,zip))
}
townzipsplit<-as.data.frame(t(sapply(townzip$V1,zip_split)))
rownames(townzipsplit)<-NULL
names(townzipsplit)<-c("town","state","zip")
townzipsplit$latlon<-NA
# I don't know the name of the zipcode column in the "zipcode"
data frame
newzipdf<-merge(townzipsplit,zipcodedf,by.x="zip",by.y="zip")
Jim
On Tue, May 14, 2019 at 5:57 AM Nicola Ruggiero
<nicola.ruggiero.unt at gmail.com> wrote:>
> Hello everyone,
>
> I've downloaded Jeffrey Breen's R package "zipcode,"
which has the
> latitude and longitude for all of the US zip codes. So, this is a
> data.frame with 43,191 observations. That's one data frame in my
> environment.
>
> Then, I have another data.frame with over 100,000 observations that
> look like this:
>
> waltham, Massachusetts 02451
> Columbia, SC 29209
>
> Wheat Ridge , Colorado 80033
> Charlottesville, Virginia 22902
> Fairbanks, AK 99709
> Montpelier, VT 05602
> Dobbs Ferry, New York 10522
>
> Henderson , Kentucky 42420
>
> The spaces represent absences in the column. Regardless,
> I need to figure out how to write a code that would, presumably, match
> the zipcodes and produce another column to the data frame with the
> latitude and longitude. So, for example, the code would recognize
> 02451 above, and, in the the column next to it, the code would write
> 42.3765? N, 71.2356? W in the column next to it, since that's the
> latitude and longitude for Waltham, Massachusetts.
>
> Any idea of how to begin a code that would perform such an operation?
>
> Again, I have a data.frame with the zipcodes linked to the the
> latitudes and longitudes, on the one hand, and another data.frame with
> only zipcodes (and some holes). I need to produce the corresponding
> latitude/longitudes in the latter data.frame.
>
> Nicola
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hi Jim,
I ended up collaborating with someone, and, on the basis of looking at
your code (we did take it into consideration and talk about it), we
came up with this:
library(stringr)
numextract <- function(string){
str_extract(string, "\\-*\\d+\\,*\\d*")
}
myDataSet$zip<-numextract(myDataSet$state)
combineddata<-merge(zipcode, myDataSet, by.x="zip",
by.y="zip")
So, as I understand it, we build a function the purpose of which was
to extract the numerical value from a string value, imputed that into
a column, then merged the two data frames together. It worked!
Now I just need to figure out this thing called shape data...basically
I need to figure out how to interpose a shape of the United States
underneath my data points so that I can see them over the location to
which they correspond.
Nicola
On Mon, May 13, 2019 at 9:09 PM Jim Lemon <drjimlemon at gmail.com>
wrote:>
> Hi Nicola,
> Getting the blank rows will be a bit more difficult and I don't see
> why they should be in the final data frame, so:
>
> townzip<-read.table(text="waltham, Massachusetts 02451
> Columbia, SC 29209
>
> Wheat Ridge , Colorado 80033
> Charlottesville, Virginia 22902
> Fairbanks, AK 99709
> Montpelier, VT 05602
> Dobbs Ferry, New York 10522
>
> Henderson , Kentucky 42420",
> sep="\t",stringsAsFactors=FALSE)
> zip_split<-function(x) {
> commasplit<-unlist(strsplit(x,","))
> state<-trimws(gsub("[[:digit:]]","",commasplit[2]))
> zip<-trimws(gsub("[[:alpha:]]","",commasplit[2]))
> return(c(commasplit[1],state,zip))
> }
> townzipsplit<-as.data.frame(t(sapply(townzip$V1,zip_split)))
> rownames(townzipsplit)<-NULL
>
names(townzipsplit)<-c("town","state","zip")
> townzipsplit$latlon<-NA
> # I don't know the name of the zipcode column in the
"zipcode" data frame
>
newzipdf<-merge(townzipsplit,zipcodedf,by.x="zip",by.y="zip")
>
> Jim
>
> On Tue, May 14, 2019 at 5:57 AM Nicola Ruggiero
> <nicola.ruggiero.unt at gmail.com> wrote:
> >
> > Hello everyone,
> >
> > I've downloaded Jeffrey Breen's R package "zipcode,"
which has the
> > latitude and longitude for all of the US zip codes. So, this is a
> > data.frame with 43,191 observations. That's one data frame in my
> > environment.
> >
> > Then, I have another data.frame with over 100,000 observations that
> > look like this:
> >
> > waltham, Massachusetts 02451
> > Columbia, SC 29209
> >
> > Wheat Ridge , Colorado 80033
> > Charlottesville, Virginia 22902
> > Fairbanks, AK 99709
> > Montpelier, VT 05602
> > Dobbs Ferry, New York 10522
> >
> > Henderson , Kentucky 42420
> >
> > The spaces represent absences in the column. Regardless,
> > I need to figure out how to write a code that would, presumably, match
> > the zipcodes and produce another column to the data frame with the
> > latitude and longitude. So, for example, the code would recognize
> > 02451 above, and, in the the column next to it, the code would write
> > 42.3765? N, 71.2356? W in the column next to it, since that's the
> > latitude and longitude for Waltham, Massachusetts.
> >
> > Any idea of how to begin a code that would perform such an operation?
> >
> > Again, I have a data.frame with the zipcodes linked to the the
> > latitudes and longitudes, on the one hand, and another data.frame with
> > only zipcodes (and some holes). I need to produce the corresponding
> > latitude/longitudes in the latter data.frame.
> >
> > Nicola
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.