Eric Berger
2022-Nov-06 15:27 UTC
[R] Selecting a minimum value of an attribute associated with point values neighboring a given point and assigning it as a new attribute
Whoops ... left out a line in Part 2. Resending with the correction
## PART 2: You can use this code on the real data with f() defined appropriately
A <- matrix(0,N,N)
v <- 1:N
## get the indices (j,k) where j < k (as columns in a data.frame)
idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j < k)
u <- sapply(1:nrow(idx),
\(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k] <<-
f(j,k,myData) })
B <- A + t(A) + diag(N)
C <- diag(myData$Conc)
D <- B %*% C
D[D==0] <- NA
myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)})
print(head(myData))
On Sun, Nov 6, 2022 at 5:19 PM Eric Berger <ericjberger at gmail.com>
wrote:>
> Hi Tiffany,
> Here is some code that might help with your problem. I solve a
"toy"
> problem that is conceptually the same.
> Part 1 sets up my toy problem. You would have to replace Part 1 with
> your real case. The main point is to define
> a function f(i, j, data) which returns 0 or 1 depending on whether the
> observations in rows i and j in your dataset 'data'
> are within your cutoff distance (i.e. 50m).
>
> You can then use Part 2 almost without changes (except for changing
> 'myData' to the correct name of your data).
>
> I hope this helps,
> Eric
>
> library(dplyr)
>
> ## PART 1: create fake data for minimal example
> set.seed(123) ## for reproducibility
> N <- 5 ## replace by number of locations (approx 9000 in your
case)
> MAX_DISTANCE <- 2 ## 50 in your case
> myData <- data.frame(x=rnorm(N),y=rnorm(N),Conc=sample(1:N,N))
>
> ## The function which you must re-define for your actual case.
> f <- function(i,j,a) {
> dist <- sqrt(sum((a[i,1:2] - a[j,1:2])^2)) ## Euclidean distance
> as.integer(dist < MAX_DISTANCE)
> }
>
> ## PART 2: You can use this code on the real data with f() defined
appropriately
> A <- matrix(0,N,N)
> ## get the indices (j,k) where j < k (as columns in a data.frame)
> idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j <
k)
> u <- sapply(1:nrow(idx),\(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k]
<<-
> f(j,k,myData) })
> B <- A + t(A) + diag(N)
> C <- diag(myData$Conc)
> D <- B %*% C
> D[D==0] <- NA
> myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)})
> print(head(myData))
>
>
> On Sat, Nov 5, 2022 at 5:14 PM Bert Gunter <bgunter.4567 at
gmail.com> wrote:
> >
> > Probably better posted on R-sig-geo.
> >
> > -- Bert
> >
> > On Sat, Nov 5, 2022 at 12:36 AM Duhl, Tiffany R. <Tiffany.Duhl at
tufts.edu>
> > wrote:
> >
> > > Hello,
> > >
> > > I have sets of spatial points with LAT, LON coords (unprojected,
WGS84
> > > datum) and several value attributes associated with each point,
from
> > > numerous csv files (with an average of 6,000-9,000 points in each
file) as
> > > shown in the following example:
> > >
> > > data<- read.csv("R_find_pts_testdata.csv")
> > >
> > > > data
> > > ID Date Time LAT LON
Conc
> > > Leg.Speed CO2 H2O BC61 Hr Min Sec
> > > 1 76 4/19/2021 21:25:38 42.40066 -70.98802 99300 0.0 mph
428.39 9.57
> > > 578 21 25 38
> > > 2 77 4/19/2021 21:25:39 42.40066 -70.98802 96730 0.0 mph
428.04 9.57
> > > 617 21 25 39
> > > 3 79 4/19/2021 21:25:41 42.40066 -70.98802 98800 0.2 mph
427.10 9.57
> > > 1027 21 25 41
> > > 4 80 4/19/2021 21:25:42 42.40066 -70.98802 96510 2 mph
427.99 9.58
> > > 1381 21 25 42
> > > 5 81 4/19/2021 21:25:43 42.40067 -70.98801 95540 3 mph
427.99 9.58
> > > 1271 21 25 43
> > > 6 82 4/19/2021 21:25:44 42.40068 -70.98799 94720 4 mph
427.20 9.57
> > > 910 21 25 44
> > > 7 83 4/19/2021 21:25:45 42.40069 -70.98797 94040 5 mph
427.18 9.57
> > > 652 21 25 45
> > > 8 84 4/19/2021 21:25:46 42.40072 -70.98795 95710 7 mph
427.07 9.57
> > > 943 21 25 46
> > > 9 85 4/19/2021 21:25:47 42.40074 -70.98792 96200 8 mph
427.44 9.56
> > > 650 21 25 47
> > > 10 86 4/19/2021 21:25:48 42.40078 -70.98789 93750 10 mph
428.76 9.57
> > > 761 21 25 48
> > > 11 87 4/19/2021 21:25:49 42.40081 -70.98785 93360 11 mph
429.25 9.56
> > > 1158 21 25 49
> > > 12 88 4/19/2021 21:25:50 42.40084 -70.98781 94340 12 mph
429.56 9.57
> > > 107 21 25 50
> > > 13 89 4/19/2021 21:25:51 42.40087 -70.98775 92780 12 mph
428.62 9.56
> > > 720 21 25 51
> > >
> > >
> > > What I want to do is, for each point, identify all points within
50m of
> > > that point, find the minimum value of the "Conc"
attribute of each nearby
> > > set of points (including the original point) and then create a
new variable
> > > ("Conc_min") and assign this minimum value to a new
variable added to
> > > "data".
> > >
> > > So far, I have the following code:
> > >
> > > library(spdep)
> > > library(sf)
> > >
> > > setwd("C:\\mydirectory\\")
> > > data<- read.csv("R_find_pts_testdata.csv")
> > >
> > > #make sure the data is a data frame
> > > pts <- data.frame(data)
> > >
> > > #create spatial data frame and define projection
> > > pts_coords <- cbind(pts$LON, pts$LAT)
> > > data_pts <- SpatialPointsDataFrame(coords= pts_coords,
> > > data=pts, proj4string = CRS("+proj=longlat
+datum=WGS84"))
> > >
> > > #Re-project to WGS 84 / UTM zone 18N, so the analysis is in units
of m
> > > ptsUTM <- sf::st_as_sf(data_pts, coords = c("LAT",
"LON"), remove = F)%>%
> > > st_transform(32618)
> > >
> > > #create 50 m buffer around each point then intersect with points
and
> > > finally find neighbors within the buffers
> > > pts_buf <- sf::st_buffer(ptsUTM, 50)
> > > coords <- sf::st_coordinates(ptsUTM)
> > > int <- sf::st_intersects(pts_buf, ptsUTM)
> > > x <- spdep::dnearneigh(coords, 0, 50)
> > >
> > > Now at this point, I'm not sure what to either the
"int" (a sgbp list) or
> > > "x" (nb object) objects (or even if I need them both)
> > >
> > > > int
> > > Sparse geometry binary predicate list of length 974, where the
predicate
> > > was `intersects'
> > > first 10 elements:
> > > 1: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 2: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 3: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 4: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 5: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 6: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 7: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 8: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 9: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >
> > > > x
> > > Neighbour list object:
> > > Number of regions: 974
> > > Number of nonzero links: 34802
> > > Percentage nonzero weights: 3.668481
> > > Average number of links: 35.73101
> > >
> > > One thought is that maybe I don't need the dnearneigh
function and can
> > > instead convert "int" into a dataframe and somehow
merge or associate
> > > (perhaps with an inner join) the ID fields of the buffered and
intersecting
> > > points and then compute the minimum value of "Conc"
grouping by ID:
> > >
> > > > as.data.frame(int)
> > > row.id col.id
> > > 1 1 1
> > > 2 1 2
> > > 3 1 3
> > > 4 1 4
> > > 5 1 5
> > > 6 1 6
> > > 7 1 7
> > > 8 1 8
> > > 9 1 9
> > > 10 1 10
> > > 11 1 11
> > > 12 1 12
> > > 13 1 13
> > > 14 1 14
> > > 15 1 15
> > > 16 1 16
> > > 17 1 17
> > > 18 1 18
> > > 19 2 1
> > > 20 2 2
> > > 21 2 3
> > > 22 2 4
> > > 23 2 5
> > > 24 2 6
> > > 25 2 7
> > > 26 2 8
> > > 27 2 9
> > > 28 2 10
> > >
> > >
> > > So in the above example I'd like to take the minimum of
"Conc" among the
> > > col.id points grouped with row.id 1 (i.e., col.ids 1-18) and
assign the
> > > minimum value of this group as a new variable in data
(Data$Conc_min), and
> > > do the same for row.id 2 and all the rest of the rows.
> > >
> > > I'm just not sure how to do this and I appreciate any help
folks might
> > > have on this matter!
> > >
> > > Many thanks,
> > > -Tiffany
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
Duhl, Tiffany R.
2022-Nov-06 16:40 UTC
[R] [External] Re: Selecting a minimum value of an attribute associated with point values neighboring a given point and assigning it as a new attribute
Thanks so much Eric!
I'm going to play around with your toy code (pun intended) & see if I
can make it work for my application.
Cheers,
-Tiffany
________________________________
From: Eric Berger <ericjberger at gmail.com>
Sent: Sunday, November 6, 2022 10:27 AM
To: Bert Gunter <bgunter.4567 at gmail.com>
Cc: Duhl, Tiffany R. <Tiffany.Duhl at tufts.edu>; R-help <R-help at
r-project.org>
Subject: [External] Re: [R] Selecting a minimum value of an attribute associated
with point values neighboring a given point and assigning it as a new attribute
Whoops ... left out a line in Part 2. Resending with the correction
## PART 2: You can use this code on the real data with f() defined appropriately
A <- matrix(0,N,N)
v <- 1:N
## get the indices (j,k) where j < k (as columns in a data.frame)
idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j < k)
u <- sapply(1:nrow(idx),
\(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k] <<-
f(j,k,myData) })
B <- A + t(A) + diag(N)
C <- diag(myData$Conc)
D <- B %*% C
D[D==0] <- NA
myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)})
print(head(myData))
On Sun, Nov 6, 2022 at 5:19 PM Eric Berger <ericjberger at gmail.com>
wrote:>
> Hi Tiffany,
> Here is some code that might help with your problem. I solve a
"toy"
> problem that is conceptually the same.
> Part 1 sets up my toy problem. You would have to replace Part 1 with
> your real case. The main point is to define
> a function f(i, j, data) which returns 0 or 1 depending on whether the
> observations in rows i and j in your dataset 'data'
> are within your cutoff distance (i.e. 50m).
>
> You can then use Part 2 almost without changes (except for changing
> 'myData' to the correct name of your data).
>
> I hope this helps,
> Eric
>
> library(dplyr)
>
> ## PART 1: create fake data for minimal example
> set.seed(123) ## for reproducibility
> N <- 5 ## replace by number of locations (approx 9000 in your
case)
> MAX_DISTANCE <- 2 ## 50 in your case
> myData <- data.frame(x=rnorm(N),y=rnorm(N),Conc=sample(1:N,N))
>
> ## The function which you must re-define for your actual case.
> f <- function(i,j,a) {
> dist <- sqrt(sum((a[i,1:2] - a[j,1:2])^2)) ## Euclidean distance
> as.integer(dist < MAX_DISTANCE)
> }
>
> ## PART 2: You can use this code on the real data with f() defined
appropriately
> A <- matrix(0,N,N)
> ## get the indices (j,k) where j < k (as columns in a data.frame)
> idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j <
k)
> u <- sapply(1:nrow(idx),\(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k]
<<-
> f(j,k,myData) })
> B <- A + t(A) + diag(N)
> C <- diag(myData$Conc)
> D <- B %*% C
> D[D==0] <- NA
> myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)})
> print(head(myData))
>
>
> On Sat, Nov 5, 2022 at 5:14 PM Bert Gunter <bgunter.4567 at
gmail.com> wrote:
> >
> > Probably better posted on R-sig-geo.
> >
> > -- Bert
> >
> > On Sat, Nov 5, 2022 at 12:36 AM Duhl, Tiffany R. <Tiffany.Duhl at
tufts.edu>
> > wrote:
> >
> > > Hello,
> > >
> > > I have sets of spatial points with LAT, LON coords (unprojected,
WGS84
> > > datum) and several value attributes associated with each point,
from
> > > numerous csv files (with an average of 6,000-9,000 points in each
file) as
> > > shown in the following example:
> > >
> > > data<- read.csv("R_find_pts_testdata.csv")
> > >
> > > > data
> > > ID Date Time LAT LON
Conc
> > > Leg.Speed CO2 H2O BC61 Hr Min Sec
> > > 1 76 4/19/2021 21:25:38 42.40066 -70.98802 99300 0.0 mph
428.39 9.57
> > > 578 21 25 38
> > > 2 77 4/19/2021 21:25:39 42.40066 -70.98802 96730 0.0 mph
428.04 9.57
> > > 617 21 25 39
> > > 3 79 4/19/2021 21:25:41 42.40066 -70.98802 98800 0.2 mph
427.10 9.57
> > > 1027 21 25 41
> > > 4 80 4/19/2021 21:25:42 42.40066 -70.98802 96510 2 mph
427.99 9.58
> > > 1381 21 25 42
> > > 5 81 4/19/2021 21:25:43 42.40067 -70.98801 95540 3 mph
427.99 9.58
> > > 1271 21 25 43
> > > 6 82 4/19/2021 21:25:44 42.40068 -70.98799 94720 4 mph
427.20 9.57
> > > 910 21 25 44
> > > 7 83 4/19/2021 21:25:45 42.40069 -70.98797 94040 5 mph
427.18 9.57
> > > 652 21 25 45
> > > 8 84 4/19/2021 21:25:46 42.40072 -70.98795 95710 7 mph
427.07 9.57
> > > 943 21 25 46
> > > 9 85 4/19/2021 21:25:47 42.40074 -70.98792 96200 8 mph
427.44 9.56
> > > 650 21 25 47
> > > 10 86 4/19/2021 21:25:48 42.40078 -70.98789 93750 10 mph
428.76 9.57
> > > 761 21 25 48
> > > 11 87 4/19/2021 21:25:49 42.40081 -70.98785 93360 11 mph
429.25 9.56
> > > 1158 21 25 49
> > > 12 88 4/19/2021 21:25:50 42.40084 -70.98781 94340 12 mph
429.56 9.57
> > > 107 21 25 50
> > > 13 89 4/19/2021 21:25:51 42.40087 -70.98775 92780 12 mph
428.62 9.56
> > > 720 21 25 51
> > >
> > >
> > > What I want to do is, for each point, identify all points within
50m of
> > > that point, find the minimum value of the "Conc"
attribute of each nearby
> > > set of points (including the original point) and then create a
new variable
> > > ("Conc_min") and assign this minimum value to a new
variable added to
> > > "data".
> > >
> > > So far, I have the following code:
> > >
> > > library(spdep)
> > > library(sf)
> > >
> > > setwd("C:\\mydirectory\\")
> > > data<- read.csv("R_find_pts_testdata.csv")
> > >
> > > #make sure the data is a data frame
> > > pts <- data.frame(data)
> > >
> > > #create spatial data frame and define projection
> > > pts_coords <- cbind(pts$LON, pts$LAT)
> > > data_pts <- SpatialPointsDataFrame(coords= pts_coords,
> > > data=pts, proj4string = CRS("+proj=longlat
+datum=WGS84"))
> > >
> > > #Re-project to WGS 84 / UTM zone 18N, so the analysis is in units
of m
> > > ptsUTM <- sf::st_as_sf(data_pts, coords = c("LAT",
"LON"), remove = F)%>%
> > > st_transform(32618)
> > >
> > > #create 50 m buffer around each point then intersect with points
and
> > > finally find neighbors within the buffers
> > > pts_buf <- sf::st_buffer(ptsUTM, 50)
> > > coords <- sf::st_coordinates(ptsUTM)
> > > int <- sf::st_intersects(pts_buf, ptsUTM)
> > > x <- spdep::dnearneigh(coords, 0, 50)
> > >
> > > Now at this point, I'm not sure what to either the
"int" (a sgbp list) or
> > > "x" (nb object) objects (or even if I need them both)
> > >
> > > > int
> > > Sparse geometry binary predicate list of length 974, where the
predicate
> > > was `intersects'
> > > first 10 elements:
> > > 1: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 2: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 3: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 4: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 5: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 6: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 7: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 8: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > > 9: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >
> > > > x
> > > Neighbour list object:
> > > Number of regions: 974
> > > Number of nonzero links: 34802
> > > Percentage nonzero weights: 3.668481
> > > Average number of links: 35.73101
> > >
> > > One thought is that maybe I don't need the dnearneigh
function and can
> > > instead convert "int" into a dataframe and somehow
merge or associate
> > > (perhaps with an inner join) the ID fields of the buffered and
intersecting
> > > points and then compute the minimum value of "Conc"
grouping by ID:
> > >
> > > > as.data.frame(int)
> > > row.id col.id
> > > 1 1 1
> > > 2 1 2
> > > 3 1 3
> > > 4 1 4
> > > 5 1 5
> > > 6 1 6
> > > 7 1 7
> > > 8 1 8
> > > 9 1 9
> > > 10 1 10
> > > 11 1 11
> > > 12 1 12
> > > 13 1 13
> > > 14 1 14
> > > 15 1 15
> > > 16 1 16
> > > 17 1 17
> > > 18 1 18
> > > 19 2 1
> > > 20 2 2
> > > 21 2 3
> > > 22 2 4
> > > 23 2 5
> > > 24 2 6
> > > 25 2 7
> > > 26 2 8
> > > 27 2 9
> > > 28 2 10
> > >
> > >
> > > So in the above example I'd like to take the minimum of
"Conc" among the
> > > col.id points grouped with row.id 1 (i.e., col.ids 1-18) and
assign the
> > > minimum value of this group as a new variable in data
(Data$Conc_min), and
> > > do the same for row.id 2 and all the rest of the rows.
> > >
> > > I'm just not sure how to do this and I appreciate any help
folks might
> > > have on this matter!
> > >
> > > Many thanks,
> > > -Tiffany
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
Caution: This message originated from outside of the Tufts University
organization. Please exercise caution when clicking links or opening
attachments. When in doubt, email the TTS Service Desk at it at
tufts.edu<mailto:it at tufts.edu> or call them directly at 617-627-3376.
[[alternative HTML version deleted]]