Markus Weisner
2012-Oct-26 16:57 UTC
[R] using match-type function to return correctly ordered data from a dataframe
I am regularly running into a problem where I can't seem to figure out how
maintain correct data order when selecting data out of a dataframe. The
below code shows an example of trying to pull data from a dataframe using
ordered zip codes. My problem is returning the pulled data in the correct
order. This is a very simple example, but it illustrates a regular problem
that I am running into.
In the past, I have used fairly complicated solutions to pull this off.
There has got to be a more simple and straightforward method ... probably
some function that I missed in all my googling.
Thanks in advance for anybody's help figuring this out.
~Markus
### Function Definitions ###
# FUNCTION #1 (returns wrong order)
getLatitude1 = function(myzips) {
# load libraries and data
library(zipcode)
data(zipcode)
# get latitude values
mylats = zipcode[zipcode$zip %in% myzips, "latitude"] #problem is
that
this code does not maintain order
# return data
return(mylats)
}
# FUNCTION #2 (also returns wrong order)
getLatitude2 = function(myzips) {
# load libraries and data
library(zipcode)
data(zipcode)
# convert myzips to DF
myzips = as.data.frame(as.character(myzips))
# merge in zipcode data based on zip
results = merge(myzips, zipcode[,c("zip", "latitude")],
by.x "as.character(myzips)", by.y="zip", all.x=TRUE)
# return data
return(results$latitude)
}
### Code ###
# specify a set of zip codes
myzips = c("74432", "72537", "06026",
"01085", "65793")
# create a DF
myzips.df = data.frame(zip=myzips, latitude=NA, longitude=NA)
# look at data to determine what should be returned and in what order
library(zipcode)
data(zipcode)
zipcode[zipcode$zip %in% myzips,]
# test function #1 (function definition below)
myzips.df$latitude = getLatitude1(myzips.df$zip) #returns wrong order
# test function #2 (function definition below)
myzips.df$latitude = getLatitude2(myzips.df$zip) #also returns wrong order
# need "myzips %in% zipcode$zip" to return array/df indices rather
than
logical
[[alternative HTML version deleted]]
Jeff Newmiller
2012-Oct-27 06:00 UTC
[R] using match-type function to return correctly ordered data from a dataframe
Have you actually read
?"%in%"
?
Although a valuable tool, not all answers are most effectively obtained by
Googling.
Also, your repeated assertions that the answers are not maintained in order are
poorly framed. They DO stay in order according to the zipcode database order.
That said, your desire for numeric indexes is only as far away as your help
file.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Markus Weisner <r at themarkus.com> wrote:
>I am regularly running into a problem where I can't seem to figure out
>how
>maintain correct data order when selecting data out of a dataframe.
>The
>below code shows an example of trying to pull data from a dataframe
>using
>ordered zip codes. My problem is returning the pulled data in the
>correct
>order. This is a very simple example, but it illustrates a regular
>problem
>that I am running into.
>
>In the past, I have used fairly complicated solutions to pull this off.
>There has got to be a more simple and straightforward method ...
>probably
>some function that I missed in all my googling.
>
>Thanks in advance for anybody's help figuring this out.
>~Markus
>
>
>### Function Definitions ###
>
># FUNCTION #1 (returns wrong order)
>getLatitude1 = function(myzips) {
>
> # load libraries and data
> library(zipcode)
> data(zipcode)
>
> # get latitude values
> mylats = zipcode[zipcode$zip %in% myzips, "latitude"] #problem is
that
>this code does not maintain order
>
> # return data
> return(mylats)
>}
>
># FUNCTION #2 (also returns wrong order)
>getLatitude2 = function(myzips) {
>
> # load libraries and data
> library(zipcode)
> data(zipcode)
>
> # convert myzips to DF
> myzips = as.data.frame(as.character(myzips))
>
> # merge in zipcode data based on zip
> results = merge(myzips, zipcode[,c("zip",
"latitude")], by.x >"as.character(myzips)",
by.y="zip", all.x=TRUE)
>
> # return data
> return(results$latitude)
>}
>
>
>### Code ###
>
># specify a set of zip codes
>myzips = c("74432", "72537", "06026",
"01085", "65793")
>
># create a DF
>myzips.df = data.frame(zip=myzips, latitude=NA, longitude=NA)
>
># look at data to determine what should be returned and in what order
>library(zipcode)
>data(zipcode)
>zipcode[zipcode$zip %in% myzips,]
>
># test function #1 (function definition below)
>myzips.df$latitude = getLatitude1(myzips.df$zip) #returns wrong order
>
># test function #2 (function definition below)
>myzips.df$latitude = getLatitude2(myzips.df$zip) #also returns wrong
>order
>
>
>
># need "myzips %in% zipcode$zip" to return array/df indices rather
than
>logical
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.