Malte Hückstädt
2019-Sep-23 07:06 UTC
[R] Average distance in kilometers between subsets of points with ggmap /geosphere
I would like to determine the geographical distances from a number of addresses and determine the mean value (the mean distance) from these. In case the dataframe has only one row, I have found a solution: ```r # Pakete laden library(readxl) library(openxlsx) library(googleway) #library(sf) library(tidyverse) library(geosphere) library("ggmap") #API Key bestimmen set_key("") api_key <- "" register_google(key=api_key) # Data df <- data.frame( V1 = c("80538 M?nchen, Germany", "01328 Dresden, Germany", "80538 M?nchen, Germany", "07745 Jena, Germany", "10117 Berlin, Germany"), V2 = c("82152 Planegg, Germany", "01069 Dresden, Germany", "82152 Planegg, Germany", "07743 Jena, Germany", "14195 Berlin, Germany"), V3 = c("85748 Garching, Germany", "01069 Dresden, Germany", "85748 Garching, Germany", NA, "10318 Berlin, Germany"), V4 = c("80805 M?nchen, Germany", "01187 Dresden, Germany", "80805 M?nchen, Germany", "07745 Jena, Germany", NA), stringsAsFactors=FALSE ) #replace NA for geocode-funktion df[is.na(df)] <- "" #slice it df1 <- slice(df, 5:5) # lon lat Informations df_2 <- geocode(c(df1$V1, df1$V2,df1$V3, df1$V4)) %>% na.omit() # to Matrix mat_df <- as.matrix(df_2) #dist-mat dist_mat <- distm(mat_df) #mean-dist of row 5 mean(dist_mat[lower.tri(dist_mat)])/1000 ``` Unfortunately, I fail to implement a function that executes the code for an entire data set. My current problem is, that the function does not calculate the distance-averages rowwise, but calculates the average value from all lines of the data set. ```r #Funktion Mean_Dist <- function(df,w,x,y,z) { # for (row in 1:nrow(df)) { # dist_mat <- geocode(c(w, x, y, z)) # # } df <- geocode(c(w, x, y, z)) %>% na.omit() # ziehe lon lat Informationen aus Adressen mat_df <- as.matrix(df) # schreibe diese in eine Matrix dist_mat <- distm(mat_df) dist_mean <- mean(dist_mat[lower.tri(dist_mat)]) return(dist_mean) } df %>% mutate(lon = Mean_Dist(df,df$V1, df$V2,df$V3, df$V4)/1000) ``` Do you have any idea what mistake I made? to clarify my question: What I'm trying to create a dataframe like this one (V5): ```r V1 V2 V3 V4 V5 <chr> <chr> <chr> <chr> <numeric> 1 80538 M?nchen, Germany 82152 Planegg, Germany 85748 Garching, Germany 80805 M?nchen, Germany Mean_Dist_row1 2 01328 Dresden, Germany 01069 Dresden, Germany 01069 Dresden, Germany 01187 Dresden, Germany Mean_Dist_row2 3 80538 M?nchen, Germany 82152 Planegg, Germany 85748 Garching, Germany 80805 M?nchen, Germany Mean_Dist_row3 4 07745 Jena, Germany 07743 Jena, Germany 07745 Jena, Germany 07745 Jena, Germany Mean_Dist_row4 5 10117 Berlin, Germany 14195 Berlin, Germany 10318 Berlin, Germany 14476 Potsdam, Germany Mean_Dist_row5 ``` eg an average of the distance of each row.
Eric Berger
2019-Sep-23 07:32 UTC
[R] Average distance in kilometers between subsets of points with ggmap /geosphere
Hi Malte, I only skimmed your question and looked at the desired output. I wondered if the apply function could meet your needs. Here's a small example that might help you: m <- matrix(1:9,nrow=3) m <- cbind(m,apply(m,MAR=1,mean)) # MAR=1 says to apply the function row-wise m # [,1] [,2] [,3] [,4] # [1,] 1 4 7 4 # [2,] 2 5 8 5 # [3,] 3 6 9 6 HTH, Eric On Mon, Sep 23, 2019 at 10:18 AM Malte H?ckst?dt < deaddatascientists at gmail.com> wrote:> I would like to determine the geographical distances from a number of > addresses and determine the mean value (the mean distance) from these. > > In case the dataframe has only one row, I have found a solution: > > ```r > # Pakete laden > library(readxl) > library(openxlsx) > library(googleway) > #library(sf) > library(tidyverse) > library(geosphere) > library("ggmap") > > #API Key bestimmen > set_key("") > api_key <- "" > register_google(key=api_key) > > # Data > df <- data.frame( > V1 = c("80538 M?nchen, Germany", "01328 Dresden, Germany", "80538 > M?nchen, Germany", > "07745 Jena, Germany", "10117 Berlin, Germany"), > V2 = c("82152 Planegg, Germany", "01069 Dresden, Germany", "82152 > Planegg, Germany", > "07743 Jena, Germany", "14195 Berlin, Germany"), > V3 = c("85748 Garching, Germany", "01069 Dresden, Germany", "85748 > Garching, Germany", > NA, "10318 Berlin, Germany"), > V4 = c("80805 M?nchen, Germany", "01187 Dresden, Germany", "80805 > M?nchen, Germany", > "07745 Jena, Germany", NA), stringsAsFactors=FALSE > ) > > #replace NA for geocode-funktion > df[is.na(df)] <- "" > > #slice it > df1 <- slice(df, 5:5) > > # lon lat Informations > df_2 <- geocode(c(df1$V1, df1$V2,df1$V3, df1$V4)) %>% na.omit() > > # to Matrix > mat_df <- as.matrix(df_2) > > #dist-mat > dist_mat <- distm(mat_df) > > #mean-dist of row 5 > mean(dist_mat[lower.tri(dist_mat)])/1000 > ``` > > Unfortunately, I fail to implement a function that executes the code for > an entire data set. My current problem is, that the function does not > calculate the distance-averages rowwise, but calculates the average value > from all lines of the data set. > > ```r > #Funktion > > Mean_Dist <- function(df,w,x,y,z) { > > # for (row in 1:nrow(df)) { > # dist_mat <- geocode(c(w, x, y, z)) > # > # } > > df <- geocode(c(w, x, y, z)) %>% na.omit() # ziehe lon lat Informationen > aus Adressen > > mat_df <- as.matrix(df) # schreibe diese in eine Matrix > > dist_mat <- distm(mat_df) > > dist_mean <- mean(dist_mat[lower.tri(dist_mat)]) > > return(dist_mean) > } > > df %>% mutate(lon = Mean_Dist(df,df$V1, df$V2,df$V3, df$V4)/1000) > > ``` > Do you have any idea what mistake I made? > > to clarify my question: What I'm trying to create a dataframe like this > one (V5): > > ```r > V1 V2 V3 > V4 V5 > <chr> <chr> <chr> > <chr> <numeric> > 1 80538 M?nchen, Germany 82152 Planegg, Germany 85748 Garching, Germany > 80805 M?nchen, Germany Mean_Dist_row1 > 2 01328 Dresden, Germany 01069 Dresden, Germany 01069 Dresden, Germany > 01187 Dresden, Germany Mean_Dist_row2 > 3 80538 M?nchen, Germany 82152 Planegg, Germany 85748 Garching, Germany > 80805 M?nchen, Germany Mean_Dist_row3 > 4 07745 Jena, Germany 07743 Jena, Germany 07745 Jena, Germany > 07745 Jena, Germany Mean_Dist_row4 > 5 10117 Berlin, Germany 14195 Berlin, Germany 10318 Berlin, Germany > 14476 Potsdam, Germany Mean_Dist_row5 > ``` > > eg an average of the distance of each row. > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Eric Berger
2019-Sep-24 07:50 UTC
[R] Average distance in kilometers between subsets of points with ggmap /geosphere
You are welcome On Tue, Sep 24, 2019 at 9:10 AM Malte H?ckst?dt < deaddatascientists at gmail.com> wrote:> Hello Eric, thanks a lot!In fact, your tip helped me a lot. I have now > found a solution with lappy and apply. Thank you very much! > > regards, malte > > > Am 23.09.2019 um 09:32 schrieb Eric Berger <ericjberger at gmail.com>: > > Hi Malte, > I only skimmed your question and looked at the desired output. > I wondered if the apply function could meet your needs. > Here's a small example that might help you: > > m <- matrix(1:9,nrow=3) > m <- cbind(m,apply(m,MAR=1,mean)) # MAR=1 says to apply the function > row-wise > m > > # [,1] [,2] [,3] [,4] > # [1,] 1 4 7 4 > # [2,] 2 5 8 5 > # [3,] 3 6 9 6 > > HTH, > Eric > > > On Mon, Sep 23, 2019 at 10:18 AM Malte H?ckst?dt < > deaddatascientists at gmail.com> wrote: > >> I would like to determine the geographical distances from a number of >> addresses and determine the mean value (the mean distance) from these. >> >> In case the dataframe has only one row, I have found a solution: >> >> ```r >> # Pakete laden >> library(readxl) >> library(openxlsx) >> library(googleway) >> #library(sf) >> library(tidyverse) >> library(geosphere) >> library("ggmap") >> >> #API Key bestimmen >> set_key("") >> api_key <- "" >> register_google(key=api_key) >> >> # Data >> df <- data.frame( >> V1 = c("80538 M?nchen, Germany", "01328 Dresden, Germany", "80538 >> M?nchen, Germany", >> "07745 Jena, Germany", "10117 Berlin, Germany"), >> V2 = c("82152 Planegg, Germany", "01069 Dresden, Germany", "82152 >> Planegg, Germany", >> "07743 Jena, Germany", "14195 Berlin, Germany"), >> V3 = c("85748 Garching, Germany", "01069 Dresden, Germany", "85748 >> Garching, Germany", >> NA, "10318 Berlin, Germany"), >> V4 = c("80805 M?nchen, Germany", "01187 Dresden, Germany", "80805 >> M?nchen, Germany", >> "07745 Jena, Germany", NA), stringsAsFactors=FALSE >> ) >> >> #replace NA for geocode-funktion >> df[is.na(df)] <- "" >> >> #slice it >> df1 <- slice(df, 5:5) >> >> # lon lat Informations >> df_2 <- geocode(c(df1$V1, df1$V2,df1$V3, df1$V4)) %>% na.omit() >> >> # to Matrix >> mat_df <- as.matrix(df_2) >> >> #dist-mat >> dist_mat <- distm(mat_df) >> >> #mean-dist of row 5 >> mean(dist_mat[lower.tri(dist_mat)])/1000 >> ``` >> >> Unfortunately, I fail to implement a function that executes the code for >> an entire data set. My current problem is, that the function does not >> calculate the distance-averages rowwise, but calculates the average value >> from all lines of the data set. >> >> ```r >> #Funktion >> >> Mean_Dist <- function(df,w,x,y,z) { >> >> # for (row in 1:nrow(df)) { >> # dist_mat <- geocode(c(w, x, y, z)) >> # >> # } >> >> df <- geocode(c(w, x, y, z)) %>% na.omit() # ziehe lon lat >> Informationen aus Adressen >> >> mat_df <- as.matrix(df) # schreibe diese in eine Matrix >> >> dist_mat <- distm(mat_df) >> >> dist_mean <- mean(dist_mat[lower.tri(dist_mat)]) >> >> return(dist_mean) >> } >> >> df %>% mutate(lon = Mean_Dist(df,df$V1, df$V2,df$V3, df$V4)/1000) >> >> ``` >> Do you have any idea what mistake I made? >> >> to clarify my question: What I'm trying to create a dataframe like this >> one (V5): >> >> ```r >> V1 V2 V3 >> V4 V5 >> <chr> <chr> <chr> >> <chr> <numeric> >> 1 80538 M?nchen, Germany 82152 Planegg, Germany 85748 Garching, Germany >> 80805 M?nchen, Germany Mean_Dist_row1 >> 2 01328 Dresden, Germany 01069 Dresden, Germany 01069 Dresden, Germany >> 01187 Dresden, Germany Mean_Dist_row2 >> 3 80538 M?nchen, Germany 82152 Planegg, Germany 85748 Garching, Germany >> 80805 M?nchen, Germany Mean_Dist_row3 >> 4 07745 Jena, Germany 07743 Jena, Germany 07745 Jena, Germany >> 07745 Jena, Germany Mean_Dist_row4 >> 5 10117 Berlin, Germany 14195 Berlin, Germany 10318 Berlin, Germany >> 14476 Potsdam, Germany Mean_Dist_row5 >> ``` >> >> eg an average of the distance of each row. >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > >[[alternative HTML version deleted]]