Thomas Grünewald
2011-Nov-16 15:29 UTC
[R] calculating variograms (gstat) with large data sets
Dear all, I am aiming to calculate variograms using variogram() from gstat. The problem is that some of my data-sets are very large (> 400000 points). Running the command takes some hours of time and does not give any error-message. Nevertheless the result seem not to be appropriate - the first few bins are ok (up to a distance of about 300) but then it gains lags which are much larger than the spatial extent of the data and the bins are not continuous any more. Running the code on smaller areas gives correct results. That's why I think that the problem is the memory. I am running the code with R 2.10.1 on a linux grid (Intel(R)Core(TM) i7-2600CPU@3.40GHz; 32 bit). So my questions: - is there a better way to calculate variograms with such large data sets or do I have to reduce the data? - Could parallel computation (on multiple cores) be a solution? And if yes, how could that be done? Here is the code I am using: "scans" is a 3 column vector containing x, y, and z values resulting from a high resolution (1 m) digital elevation model. The extent of the data is about 600*600 m, the #define 50 bins log-scaled and with a maximum of 600 x = seq(1,50,1); a = exp(log(600)/50); logwidth = a^x; #variogram coordinates(scans) = ~V1+V2; v = variogram(V3~1, scans, boundaries = logwidth); Thank you very much, Tom -- Thomas Grünewald WSL Institute for Snow and Avalanche Research SLF Research Unit Snow and Permafrost Team Snow Cover and Micrometeorology Flüelastr. 11 CH-7260 Davos Dorf Tel. +41/81/417 0365 Fax. +41/81/417 0110 gruenewald@slf.ch http://www.slf.ch [[alternative HTML version deleted]]