On Fri, 1 Aug 2014 07:25:05 AM barbara tornimbene wrote:> HI.
> I have a set of disease outbreak data. Each observation have a
> location (spatial coordinates) and a start date. Outbreaks that occur
in> the same location within a two week periods have to be merged.
Basically I> need to delete duplicates that have same spatial coordinated and
start> dates comprised in a two weeks range. I am ok with the first bit
> (coordinates), but It is the date range that I am not sure how to
define. I> thought about creating a dummy variable for observations within a
date> range, but those might have different locations. Any help would be
greatly> appreciated. Thanks
Hi barbara,
I assume that the spatial coordinates have to be within a certain
distance to be considered the same, unless they are based on
something like cities or health administration districts. If your
observations can be ordered by date, the problem is not too difficult.
date_range<-as.Date(c("1/1/2014","1/8/2014"),"%d/%m/%Y")
disease.df<-data.frame(
onset=sample(seq(date_range[1],date_range[2],by=1),100),
lat=sample(seq(-33,-35,by=-1),100,TRUE),
lon=sample(seq(148,151,by=1),100,TRUE))
disease.df<-disease.df[order(disease.df$onset),]
disease.df$drop<-0
nobs<-dim(disease.df)[1]
for(start in 1:(nobs-1)) {
cat(start,"\n")
end<-start+1
while(disease.df$onset[end] < disease.df$onset[start]+14 &&
end < nobs) end<-end+1
if(disease.df$onset[end] - disease.df$onset[start] > 14)
end<-end-1
sameplace<-
disease.df$lat[start] == disease.df$lat[(start+1):end] &
disease.df$lon[start] == disease.df$lon[(start+1):end]
if(any(sameplace)) {
disease.df$drop[start]<-1
disease.df$drop[(start+1):end]<-
disease.df$drop[(start+1):end]+sameplace
}
}
Caution - I haven't checked this exhaustively and I have assumed that
locations must be equal, not within some distance.
Jim