Eric Berger
2022-Nov-06 15:19 UTC
[R] Selecting a minimum value of an attribute associated with point values neighboring a given point and assigning it as a new attribute
Hi Tiffany, Here is some code that might help with your problem. I solve a "toy" problem that is conceptually the same. Part 1 sets up my toy problem. You would have to replace Part 1 with your real case. The main point is to define a function f(i, j, data) which returns 0 or 1 depending on whether the observations in rows i and j in your dataset 'data' are within your cutoff distance (i.e. 50m). You can then use Part 2 almost without changes (except for changing 'myData' to the correct name of your data). I hope this helps, Eric library(dplyr) ## PART 1: create fake data for minimal example set.seed(123) ## for reproducibility N <- 5 ## replace by number of locations (approx 9000 in your case) MAX_DISTANCE <- 2 ## 50 in your case myData <- data.frame(x=rnorm(N),y=rnorm(N),Conc=sample(1:N,N)) ## The function which you must re-define for your actual case. f <- function(i,j,a) { dist <- sqrt(sum((a[i,1:2] - a[j,1:2])^2)) ## Euclidean distance as.integer(dist < MAX_DISTANCE) } ## PART 2: You can use this code on the real data with f() defined appropriately A <- matrix(0,N,N) ## get the indices (j,k) where j < k (as columns in a data.frame) idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j < k) u <- sapply(1:nrow(idx),\(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k] <<- f(j,k,myData) }) B <- A + t(A) + diag(N) C <- diag(myData$Conc) D <- B %*% C D[D==0] <- NA myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)}) print(head(myData)) On Sat, Nov 5, 2022 at 5:14 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > Probably better posted on R-sig-geo. > > -- Bert > > On Sat, Nov 5, 2022 at 12:36 AM Duhl, Tiffany R. <Tiffany.Duhl at tufts.edu> > wrote: > > > Hello, > > > > I have sets of spatial points with LAT, LON coords (unprojected, WGS84 > > datum) and several value attributes associated with each point, from > > numerous csv files (with an average of 6,000-9,000 points in each file) as > > shown in the following example: > > > > data<- read.csv("R_find_pts_testdata.csv") > > > > > data > > ID Date Time LAT LON Conc > > Leg.Speed CO2 H2O BC61 Hr Min Sec > > 1 76 4/19/2021 21:25:38 42.40066 -70.98802 99300 0.0 mph 428.39 9.57 > > 578 21 25 38 > > 2 77 4/19/2021 21:25:39 42.40066 -70.98802 96730 0.0 mph 428.04 9.57 > > 617 21 25 39 > > 3 79 4/19/2021 21:25:41 42.40066 -70.98802 98800 0.2 mph 427.10 9.57 > > 1027 21 25 41 > > 4 80 4/19/2021 21:25:42 42.40066 -70.98802 96510 2 mph 427.99 9.58 > > 1381 21 25 42 > > 5 81 4/19/2021 21:25:43 42.40067 -70.98801 95540 3 mph 427.99 9.58 > > 1271 21 25 43 > > 6 82 4/19/2021 21:25:44 42.40068 -70.98799 94720 4 mph 427.20 9.57 > > 910 21 25 44 > > 7 83 4/19/2021 21:25:45 42.40069 -70.98797 94040 5 mph 427.18 9.57 > > 652 21 25 45 > > 8 84 4/19/2021 21:25:46 42.40072 -70.98795 95710 7 mph 427.07 9.57 > > 943 21 25 46 > > 9 85 4/19/2021 21:25:47 42.40074 -70.98792 96200 8 mph 427.44 9.56 > > 650 21 25 47 > > 10 86 4/19/2021 21:25:48 42.40078 -70.98789 93750 10 mph 428.76 9.57 > > 761 21 25 48 > > 11 87 4/19/2021 21:25:49 42.40081 -70.98785 93360 11 mph 429.25 9.56 > > 1158 21 25 49 > > 12 88 4/19/2021 21:25:50 42.40084 -70.98781 94340 12 mph 429.56 9.57 > > 107 21 25 50 > > 13 89 4/19/2021 21:25:51 42.40087 -70.98775 92780 12 mph 428.62 9.56 > > 720 21 25 51 > > > > > > What I want to do is, for each point, identify all points within 50m of > > that point, find the minimum value of the "Conc" attribute of each nearby > > set of points (including the original point) and then create a new variable > > ("Conc_min") and assign this minimum value to a new variable added to > > "data". > > > > So far, I have the following code: > > > > library(spdep) > > library(sf) > > > > setwd("C:\\mydirectory\\") > > data<- read.csv("R_find_pts_testdata.csv") > > > > #make sure the data is a data frame > > pts <- data.frame(data) > > > > #create spatial data frame and define projection > > pts_coords <- cbind(pts$LON, pts$LAT) > > data_pts <- SpatialPointsDataFrame(coords= pts_coords, > > data=pts, proj4string = CRS("+proj=longlat +datum=WGS84")) > > > > #Re-project to WGS 84 / UTM zone 18N, so the analysis is in units of m > > ptsUTM <- sf::st_as_sf(data_pts, coords = c("LAT", "LON"), remove = F)%>% > > st_transform(32618) > > > > #create 50 m buffer around each point then intersect with points and > > finally find neighbors within the buffers > > pts_buf <- sf::st_buffer(ptsUTM, 50) > > coords <- sf::st_coordinates(ptsUTM) > > int <- sf::st_intersects(pts_buf, ptsUTM) > > x <- spdep::dnearneigh(coords, 0, 50) > > > > Now at this point, I'm not sure what to either the "int" (a sgbp list) or > > "x" (nb object) objects (or even if I need them both) > > > > > int > > Sparse geometry binary predicate list of length 974, where the predicate > > was `intersects' > > first 10 elements: > > 1: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > 2: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > 3: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > 4: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > 5: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > 6: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > 7: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > 8: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > 9: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > > > > x > > Neighbour list object: > > Number of regions: 974 > > Number of nonzero links: 34802 > > Percentage nonzero weights: 3.668481 > > Average number of links: 35.73101 > > > > One thought is that maybe I don't need the dnearneigh function and can > > instead convert "int" into a dataframe and somehow merge or associate > > (perhaps with an inner join) the ID fields of the buffered and intersecting > > points and then compute the minimum value of "Conc" grouping by ID: > > > > > as.data.frame(int) > > row.id col.id > > 1 1 1 > > 2 1 2 > > 3 1 3 > > 4 1 4 > > 5 1 5 > > 6 1 6 > > 7 1 7 > > 8 1 8 > > 9 1 9 > > 10 1 10 > > 11 1 11 > > 12 1 12 > > 13 1 13 > > 14 1 14 > > 15 1 15 > > 16 1 16 > > 17 1 17 > > 18 1 18 > > 19 2 1 > > 20 2 2 > > 21 2 3 > > 22 2 4 > > 23 2 5 > > 24 2 6 > > 25 2 7 > > 26 2 8 > > 27 2 9 > > 28 2 10 > > > > > > So in the above example I'd like to take the minimum of "Conc" among the > > col.id points grouped with row.id 1 (i.e., col.ids 1-18) and assign the > > minimum value of this group as a new variable in data (Data$Conc_min), and > > do the same for row.id 2 and all the rest of the rows. > > > > I'm just not sure how to do this and I appreciate any help folks might > > have on this matter! > > > > Many thanks, > > -Tiffany > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Eric Berger
2022-Nov-06 15:27 UTC
[R] Selecting a minimum value of an attribute associated with point values neighboring a given point and assigning it as a new attribute
Whoops ... left out a line in Part 2. Resending with the correction ## PART 2: You can use this code on the real data with f() defined appropriately A <- matrix(0,N,N) v <- 1:N ## get the indices (j,k) where j < k (as columns in a data.frame) idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j < k) u <- sapply(1:nrow(idx), \(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k] <<- f(j,k,myData) }) B <- A + t(A) + diag(N) C <- diag(myData$Conc) D <- B %*% C D[D==0] <- NA myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)}) print(head(myData)) On Sun, Nov 6, 2022 at 5:19 PM Eric Berger <ericjberger at gmail.com> wrote:> > Hi Tiffany, > Here is some code that might help with your problem. I solve a "toy" > problem that is conceptually the same. > Part 1 sets up my toy problem. You would have to replace Part 1 with > your real case. The main point is to define > a function f(i, j, data) which returns 0 or 1 depending on whether the > observations in rows i and j in your dataset 'data' > are within your cutoff distance (i.e. 50m). > > You can then use Part 2 almost without changes (except for changing > 'myData' to the correct name of your data). > > I hope this helps, > Eric > > library(dplyr) > > ## PART 1: create fake data for minimal example > set.seed(123) ## for reproducibility > N <- 5 ## replace by number of locations (approx 9000 in your case) > MAX_DISTANCE <- 2 ## 50 in your case > myData <- data.frame(x=rnorm(N),y=rnorm(N),Conc=sample(1:N,N)) > > ## The function which you must re-define for your actual case. > f <- function(i,j,a) { > dist <- sqrt(sum((a[i,1:2] - a[j,1:2])^2)) ## Euclidean distance > as.integer(dist < MAX_DISTANCE) > } > > ## PART 2: You can use this code on the real data with f() defined appropriately > A <- matrix(0,N,N) > ## get the indices (j,k) where j < k (as columns in a data.frame) > idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j < k) > u <- sapply(1:nrow(idx),\(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k] <<- > f(j,k,myData) }) > B <- A + t(A) + diag(N) > C <- diag(myData$Conc) > D <- B %*% C > D[D==0] <- NA > myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)}) > print(head(myData)) > > > On Sat, Nov 5, 2022 at 5:14 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: > > > > Probably better posted on R-sig-geo. > > > > -- Bert > > > > On Sat, Nov 5, 2022 at 12:36 AM Duhl, Tiffany R. <Tiffany.Duhl at tufts.edu> > > wrote: > > > > > Hello, > > > > > > I have sets of spatial points with LAT, LON coords (unprojected, WGS84 > > > datum) and several value attributes associated with each point, from > > > numerous csv files (with an average of 6,000-9,000 points in each file) as > > > shown in the following example: > > > > > > data<- read.csv("R_find_pts_testdata.csv") > > > > > > > data > > > ID Date Time LAT LON Conc > > > Leg.Speed CO2 H2O BC61 Hr Min Sec > > > 1 76 4/19/2021 21:25:38 42.40066 -70.98802 99300 0.0 mph 428.39 9.57 > > > 578 21 25 38 > > > 2 77 4/19/2021 21:25:39 42.40066 -70.98802 96730 0.0 mph 428.04 9.57 > > > 617 21 25 39 > > > 3 79 4/19/2021 21:25:41 42.40066 -70.98802 98800 0.2 mph 427.10 9.57 > > > 1027 21 25 41 > > > 4 80 4/19/2021 21:25:42 42.40066 -70.98802 96510 2 mph 427.99 9.58 > > > 1381 21 25 42 > > > 5 81 4/19/2021 21:25:43 42.40067 -70.98801 95540 3 mph 427.99 9.58 > > > 1271 21 25 43 > > > 6 82 4/19/2021 21:25:44 42.40068 -70.98799 94720 4 mph 427.20 9.57 > > > 910 21 25 44 > > > 7 83 4/19/2021 21:25:45 42.40069 -70.98797 94040 5 mph 427.18 9.57 > > > 652 21 25 45 > > > 8 84 4/19/2021 21:25:46 42.40072 -70.98795 95710 7 mph 427.07 9.57 > > > 943 21 25 46 > > > 9 85 4/19/2021 21:25:47 42.40074 -70.98792 96200 8 mph 427.44 9.56 > > > 650 21 25 47 > > > 10 86 4/19/2021 21:25:48 42.40078 -70.98789 93750 10 mph 428.76 9.57 > > > 761 21 25 48 > > > 11 87 4/19/2021 21:25:49 42.40081 -70.98785 93360 11 mph 429.25 9.56 > > > 1158 21 25 49 > > > 12 88 4/19/2021 21:25:50 42.40084 -70.98781 94340 12 mph 429.56 9.57 > > > 107 21 25 50 > > > 13 89 4/19/2021 21:25:51 42.40087 -70.98775 92780 12 mph 428.62 9.56 > > > 720 21 25 51 > > > > > > > > > What I want to do is, for each point, identify all points within 50m of > > > that point, find the minimum value of the "Conc" attribute of each nearby > > > set of points (including the original point) and then create a new variable > > > ("Conc_min") and assign this minimum value to a new variable added to > > > "data". > > > > > > So far, I have the following code: > > > > > > library(spdep) > > > library(sf) > > > > > > setwd("C:\\mydirectory\\") > > > data<- read.csv("R_find_pts_testdata.csv") > > > > > > #make sure the data is a data frame > > > pts <- data.frame(data) > > > > > > #create spatial data frame and define projection > > > pts_coords <- cbind(pts$LON, pts$LAT) > > > data_pts <- SpatialPointsDataFrame(coords= pts_coords, > > > data=pts, proj4string = CRS("+proj=longlat +datum=WGS84")) > > > > > > #Re-project to WGS 84 / UTM zone 18N, so the analysis is in units of m > > > ptsUTM <- sf::st_as_sf(data_pts, coords = c("LAT", "LON"), remove = F)%>% > > > st_transform(32618) > > > > > > #create 50 m buffer around each point then intersect with points and > > > finally find neighbors within the buffers > > > pts_buf <- sf::st_buffer(ptsUTM, 50) > > > coords <- sf::st_coordinates(ptsUTM) > > > int <- sf::st_intersects(pts_buf, ptsUTM) > > > x <- spdep::dnearneigh(coords, 0, 50) > > > > > > Now at this point, I'm not sure what to either the "int" (a sgbp list) or > > > "x" (nb object) objects (or even if I need them both) > > > > > > > int > > > Sparse geometry binary predicate list of length 974, where the predicate > > > was `intersects' > > > first 10 elements: > > > 1: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > > 2: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > > 3: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > > 4: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > > 5: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > > 6: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > > 7: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > > 8: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > > 9: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... > > > > > > > x > > > Neighbour list object: > > > Number of regions: 974 > > > Number of nonzero links: 34802 > > > Percentage nonzero weights: 3.668481 > > > Average number of links: 35.73101 > > > > > > One thought is that maybe I don't need the dnearneigh function and can > > > instead convert "int" into a dataframe and somehow merge or associate > > > (perhaps with an inner join) the ID fields of the buffered and intersecting > > > points and then compute the minimum value of "Conc" grouping by ID: > > > > > > > as.data.frame(int) > > > row.id col.id > > > 1 1 1 > > > 2 1 2 > > > 3 1 3 > > > 4 1 4 > > > 5 1 5 > > > 6 1 6 > > > 7 1 7 > > > 8 1 8 > > > 9 1 9 > > > 10 1 10 > > > 11 1 11 > > > 12 1 12 > > > 13 1 13 > > > 14 1 14 > > > 15 1 15 > > > 16 1 16 > > > 17 1 17 > > > 18 1 18 > > > 19 2 1 > > > 20 2 2 > > > 21 2 3 > > > 22 2 4 > > > 23 2 5 > > > 24 2 6 > > > 25 2 7 > > > 26 2 8 > > > 27 2 9 > > > 28 2 10 > > > > > > > > > So in the above example I'd like to take the minimum of "Conc" among the > > > col.id points grouped with row.id 1 (i.e., col.ids 1-18) and assign the > > > minimum value of this group as a new variable in data (Data$Conc_min), and > > > do the same for row.id 2 and all the rest of the rows. > > > > > > I'm just not sure how to do this and I appreciate any help folks might > > > have on this matter! > > > > > > Many thanks, > > > -Tiffany > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.