Hi
I am attempting to explore the scale of spatial autocorrelation in a raster
(eventually across a stack of 10 but for now a single layer) and consequently in
a potential sample of points across the landscape (ie. if we wanted to know what
sampling design in terms of distance would minimize autocorrelation). I?ve spent
a couple of days trying to understand the various ways to evaluate spatial
autocorrelation for a raster or points dataset but am struggling with a few
questions. I hope someone can kindly shed some light on the following (in my
example I?m playing with a single WorldClim layer at a resolution of 1 km,
cropped to the eastern third of the USA):
1) In spdep, I?ve done the following (with my [potentially erroneous] thinking
laid out in the comments and two questions at the end):
### use the raster package to get a regular sample of points across the
raster### because using the full set of cells or their centroids on a large
raster seems to crash R downstream
y<-sampleRegular(my raster,size=1000,xy=TRUE)
### tidy point dataset### in particular, missing values (e.g. the ocean) in the
raster and thus in the points lead to errors later, so remove these
dd<-y[complete.cases(y),]dd$ID<-row.names(dd)c<-coordinates(dd[,c("x","y")])
### make nb object (provides list of nearest neighbours for lower lag class)###
here I?ve chosen k=8 which I?m assuming given the regular sampling of points is
almost akin to the ?queens? design in the raster-specific cell2nb command
(except for cells near the ocean)
k1_nb<-knn2nb(knearneigh(c,k=8,longlat=TRUE),row.names=IDs)
### make correlogram
sp.cor<-sp.correlogram(k1_nb,dd$V4,order=15,method="I")plot(sp.cor)
Two questions here:?
a) I?ve been able to successfully set the order to 15 but not 20 before there
are empty neighbour sets found for this particular dataset. Is there a way,
other than by trial and error to tell the maximum order possible?
b) After plotting the correlogram, I get the Moran?s I as a function of lag
distance. I see it crosses the 0 line between lags 13 and 14 ?is there a way to
tell what distance this amounts to in kms??
2) Using the pgirmess package (which I understand to be calculating the lags in
a fundamentally different way) I can get a correlogram with distances?
### so now I reproject the raster to albers equal area in order to have the
units on the x axis be metres (and actually the projection I want to use in the
end anyways)?### the rest of the steps to create dd2 are the same as above###
use the correlog function to create correlogram
pgi.cor <- correlog(coords=dd2[,1:2], z=dd2$V3, method="Moran",
nbclass=20)plot(pgi.cor)
Questions:
a) In my new plot, the distance class at which Moran?s I is no longer
significantly different from zero is around 600 km. That seems really far to
me?am I wrong in my interpretation that this distance represents the distance
beyond which sample sites would be are relatively free from autocorrelation? or
is this truly representative of the scale of autocorrelation that I can expect
in climate data over the relatively modest topographic complexity of the eastern
USA??
b) In general, when/ for what types of questions or datasets is the approach
used by spdep to generate the lag steps more appropriate than the (fixed bins?)
method of pgirmess??
3) Finally?
Please forgive me if I?m approaching this problem incorrectly altogether! I?m
eventually hoping to say something along the lines of ?if we take sites x
distance apart, we can be fairly sure that the amount of spatial autocorrelation
in our climate data will be minimal?. But maybe this is completely ridiculous?
I?d be really happy to have some suggestions.?(and on a side note, I?m currently
looking for a good introduction to spatial statistics course or
textbook?something for the truly uninitiated. Any recommendations?)
Many thanks!
PS. Online sources for some of the code above:
http://www.r-bloggers.com/spatial-correlograms-in-r-a-mini-overview/http://www.bias-project.org.uk/ASDARcourse/unit6_slides.pdf
[[alternative HTML version deleted]]