Hi All, I am a PhD student in forestry science and I am working with LiDAR data set (huge data set). I am a brand-new in R and geostatistic (SORRY, my background it?s in forestry) but I wish improve my skill for improve myself. I wish to develop a methodology to processing a large data-set of points (typical in LiDAR) but there is a problem with memory. I had created a subsample data-base but the semivariogram is periodic shape and I am not to able to try a fit the model. This is a maximum of two weeks of work (day bay day) SORRY. Is there a geostatistical user I am very happy to listen his suggests. Data format is X, Y and Z (height to create the DEM) in txt format I have this questions: 1. After the random selection (10000 points) and fit.semivariogram model is it possible to use all LiDAR points? Because the new LiDAR power is to use huge number of points (X;Y; Z value) to create a very high resolution map of DEM and VEGETATION. The problem is the memory, but we can use a cluster-linux network to improve the capacity of R 2. Is it possible to improve the memory capacity? 3. The data has a trend and the qqplot shows a Gaussian trend. Is it possible to normalize the data (i.e. with log)? 4. When I use this R code ?subground.uk = krige(log(Z)~X+Y, subground, new.grid, v.fit, nmax=40)? to appear an Error massage: Error in eval(expr, envir, enclos) : oggetto "X" non trovato I send you a report and attach the image to explain better. all procedure is write in R-software and to improve in gstat . The number of points of GROUND data-set (4x2 km) is 5,459,916.00. The random sub- set from original data-set is 10000 (R code is: > samplerows <-sample(1:nrow(testground),size=10000,replace=FALSE) > subground <-testground[samplerows,]) 1. Fig 1 - Original data-set Histogram: show two populations; 2. Fig 2 ? original data-set density plot: show again two populations of data; 3. Fig 3 ?Original data-set Boxplot: show there aren?t outlayers un the data-set (the classification with terrascan is good and therefore there isn?t a problem with original data) 4. Fig 4 - plot random data-set in the space 5. Fig 5 ? ordinary kriging: show a trend in the space (hypothesis: the points are very close in the space) 6. Fig 6 ?de-trend dataset with: v <- variogram (log(Z)~X+Y, subground, cutoff=1800, width=100)) 7. Fig 7 ? map of semi-variogram: show an anisotropy in the space (0? is Nord= 135? major radius 45? minus radius) 8. Fig 8- semi-variogram with anisotropy (0?, 45?, 90?, 135?). Best shape is from 135? 9. Fig 9- semi-variogram fit with Gaussian Model. R code is:> v = variogram(Z~X+Y, subground, cutoff=1800, width=200, alpha=c(135))> v.fit = fit.variogram(v, vgm(psill = 1, model="Gau", range=1800, nugget0, anis=c(135, 0.5)))R code: testground2 <- read.table(file="c:/work_LIDAR_USA/R_kriging_new_set/ground_26841492694149_x yz.txt", header=T) class (testground2) coordinates (testground2)=~X+Y # this makes testground a SpatialPointsDataFrame class (testground2) str(as.data.frame(testground)) hist(testground$Z,nclass=20) #this makes a histogram plot(density(testground$Z)) #this makes a plot density boxplot(testground$Z)#this makes a boxplot samplerows<-sample(1:nrow(testground),size=10000,replace=FALSE) #select n. points from all data-base subground <-testground[samplerows,] hist(subground$Z,nclass=20) #this makes a histogram plot(density(subground$Z)) #this makes a plot density boxplot(subground$Z)#this makes a boxplot spplot(subground["Z"], col.regions=bpy.colors(), at = seq(850,1170,10)) library(maptools) library(gstat) plot(variogram(Z~1, subground)) #Ordinary Kriging (without detrend) # if there is a trend we must use a detrend fuction Z~X+Y x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80)) #Universal Kriging (with detrend) x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, map=T)) x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, alpha=c(0, 45, 90, 135))) v = variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, alpha=c(135, 45)) v.fit = fit.variogram(v, vgm(psill = 1, model="Gau", range=1800, nugget= 0, anis=c(135, 0.5))) plot(v, v.fit, plot.nu=F, pch="+") # create the new grid new.grid <- spsample(subground, type="regular", cellsize=c(1,1)) gridded(new.grid) <- TRUE fullgrid(new.grid) <- TRUE new.grid a grid #using Universal Kriging subground.uk = krige(log(Z)~X+Y, subground, new.grid, v.fit, nmax=40) #ERROR
you can try memory.limit(size=4000) only if you have 4GB of memory on the system This is not guaranteed to solve your problem though With big datasets like lidar, you are much better off getting access to a 64bit system with a ton of RAM (>64GB). Cheers Matt On Tue, Aug 5, 2008 at 1:47 PM, Alessandro <alessandro.montaghi@unifi.it>wrote:> > > Hi All, > > > > I am a PhD student in forestry science and I am working with LiDAR data > set (huge data set). I am a brand-new in R and geostatistic (SORRY, my > background it's in forestry) but I wish improve my skill for improve > myself. I wish to develop a methodology to processing a large data-set of > points (typical in LiDAR) but there is a problem with memory. I had created > a subsample data-base but the semivariogram is periodic shape and I am not > to able to try a fit the model. This is a maximum of two weeks of work (day > bay day) SORRY. Is there a geostatistical user I am very happy to listen his > suggests. Data format is X, Y and Z (height to create the DEM) in txt format > > I have this questions: > > > > 1. After the random selection (10000 points) and fit.semivariogram > model is it possible to use all LiDAR points? Because the new LiDAR power is > to use huge number of points (X;Y; Z value) to create a very high resolution > map of DEM and VEGETATION. The problem is the memory, but we can use a > cluster-linux network to improve the capacity of R > > > > 2. Is it possible to improve the memory capacity? > > > > 3. The data has a trend and the qqplot shows a Gaussian trend. Is it > possible to normalize the data (i.e. with log)? > > > > 4. When I use this R code "subground.uk = krige(log(Z)~X+Y, > subground, new.grid, v.fit, nmax=40)" to appear an Error massage: Error in > eval(expr, envir, enclos) : oggetto "X" non trovato > > > > I send you a report and attach the image to explain better. > > > > all procedure is write in R-software and to improve in gstat . The number > of points of GROUND data-set (4x2 km) is 5,459,916.00. The random sub- set > from original data-set is 10000 (R code is: > samplerows > <-sample(1:nrow(testground),size=10000,replace=FALSE) > subground > <-testground[samplerows,]) > > > > 1. Original data-set Histogram: show two populations; > > 2. original data-set density plot: show again two populations of > data; > > 3. Original data-set Boxplot: show there aren't outlayers un the > data-set (the classification with terrascan is good and therefore there > isn't a problem with original data) > > 4. ordinary kriging: show a trend in the space (hypothesis: the > points are very close in the space) > > 5. de-trend dataset with: v <- variogram (log(Z)~X+Y, subground, > cutoff=1800, width=100)) > > 6. map of semi-variogram: show an anisotropy in the space (0° is > Nord= 135° major radius 45° minus radius) > > 7. semi-variogram with anisotropy (0°, 45°, 90°, 135°), shows a best > shape is from 135° > > 8. semi-variogram fit with Gaussian Model. R code is (see the fig): > > > v = variogram(Z~X+Y, subground, cutoff=1800, width=200, alpha=c(135)) > > > v.fit = fit.variogram(v, vgm(psill = 1, model="Gau", range=1800, nugget> 0, anis=c(135, 0.5))) > > > > R code: > > > > testground2 <- > read.table(file="c:/work_LIDAR_USA/R_kriging_new_set/ground_26841492694149_xyz.txt", > header=T) > > class (testground2) > > coordinates (testground2)=~X+Y # this makes testground a > SpatialPointsDataFrame > > class (testground2) > > str(as.data.frame(testground)) > > > > hist(testground$Z,nclass=20) #this makes a histogram > > plot(density(testground$Z)) #this makes a plot density > > boxplot(testground$Z)#this makes a boxplot > > > > samplerows<-sample(1:nrow(testground),size=10000,replace=FALSE) #select n. > points from all data-base > > subground <-testground[samplerows,] > > hist(subground$Z,nclass=20) #this makes a histogram > > plot(density(subground$Z)) #this makes a plot density > > boxplot(subground$Z)#this makes a boxplot > > spplot(subground["Z"], col.regions=bpy.colors(), at = seq(850,1170,10)) > > > > library(maptools) > > library(gstat) > > plot(variogram(Z~1, subground)) #Ordinary Kriging (without detrend) > > # if there is a trend we must use a detrend fuction Z~X+Y > > x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80)) > #Universal Kriging (with detrend) > > x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, map=T)) > > x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, > alpha=c(0, 45, 90, 135))) > > v = variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, alpha=c(135, > 45)) > > v.fit = fit.variogram(v, vgm(psill = 1, model="Gau", range=1800, nugget= 0, > anis=c(135, 0.5))) > > plot(v, v.fit, plot.nu=F, pch="+") > > # create the new grid > > new.grid <- spsample(subground, type="regular", cellsize=c(1,1)) > > gridded(new.grid) <- TRUE > > fullgrid(new.grid) <- TRUE > > new.grid@grid > > #using Universal Kriging > > subground.uk = krige(log(Z)~X+Y, subground, new.grid, v.fit, nmax=40) > #ERROR > > > > > > > > > > _______________________________________________ > R-sig-Geo mailing list > R-sig-Geo@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/r-sig-geo > >-- Matthew J. Oliver Assistant Professor College of Marine and Earth Studies University of Delaware 700 Pilottown Rd. Lewes, DE, 19958 302-645-4079 http://www.ocean.udel.edu/people/profile.aspx?moliver [[alternative HTML version deleted]]
? stato filtrato un testo allegato il cui set di caratteri non era indicato... Nome: non disponibile URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080805/633a01c7/attachment.pl>
On Tuesday 05 August 2008, Alessandro wrote:> Hi All, > > > > I am a PhD student in forestry science and I am working with LiDAR data set > (huge data set). I am a brand-new in R and geostatistic (SORRY, my > background it?s in forestry) but I wish improve my skill for improve > myself. I wish to develop a methodology to processing a large data-set of > points (typical in LiDAR) but there is a problem with memory. I had created > a subsample data-base but the semivariogram is periodic shape and I am not > to able to try a fit the model. This is a maximum of two weeks of work (day > bay day) SORRY. Is there a geostatistical user I am very happy to listen > his suggests. Data format is X, Y and Z (height to create the DEM) in txt > format >Hi, Remeber, R is memory-bound. Why exactly are you trying to develop a method that will inherently demand support for massive datasets (LiDAR) within a framework designed to work with samples (R)? There are a wealth of gridding techniques suitable for LiDAR data that are not going to fail on massive datasets. A couple come to mind: 1. the blockmean / blockmode functions in GMT 2. r.in.xyz in GRASS 3. RST interpolation 4. natural neighbor interpolation 5 ... lots more ... i.e. why do you want to use something as memory/processor/model dependent as kriging to grid such a massive and regularly-spaced set of data? I would suggest that the time spent trying to accomplish what you propose might be better spent on thinking about the underlying purpose of the experiment. Please don't take my comments as anything other that one researcher trying to save another researcher's valuable time and effort. Cheers, Dylan> I have this questions: > > > > 1. After the random selection (10000 points) and fit.semivariogram > model is it possible to use all LiDAR points? Because the new LiDAR power > is to use huge number of points (X;Y; Z value) to create a very high > resolution map of DEM and VEGETATION. The problem is the memory, but we can > use a cluster-linux network to improve the capacity of R > > > > 2. Is it possible to improve the memory capacity? > > > > 3. The data has a trend and the qqplot shows a Gaussian trend. Is it > possible to normalize the data (i.e. with log)? > > > > 4. When I use this R code ?subground.uk = krige(log(Z)~X+Y, > subground, new.grid, v.fit, nmax=40)? to appear an Error massage: Error in > eval(expr, envir, enclos) : oggetto "X" non trovato > > > > I send you a report and attach the image to explain better. > > > > all procedure is write in R-software and to improve in gstat . The number > of points of GROUND data-set (4x2 km) is 5,459,916.00. The random sub- set > from original data-set is 10000 (R code is: > samplerows > <-sample(1:nrow(testground),size=10000,replace=FALSE) > subground > <-testground[samplerows,]) > > > > 1. Fig 1 - Original data-set Histogram: show two populations; > > 2. Fig 2 ? original data-set density plot: show again two populations > of data; > > 3. Fig 3 ?Original data-set Boxplot: show there aren?t outlayers un > the data-set (the classification with terrascan is good and therefore there > isn?t a problem with original data) > > 4. Fig 4 - plot random data-set in the space > > 5. Fig 5 ? ordinary kriging: show a trend in the space (hypothesis: > the points are very close in the space) > > 6. Fig 6 ?de-trend dataset with: v <- variogram (log(Z)~X+Y, > subground, cutoff=1800, width=100)) > > 7. Fig 7 ? map of semi-variogram: show an anisotropy in the space (0? > is Nord= 135? major radius 45? minus radius) > > 8. Fig 8- semi-variogram with anisotropy (0?, 45?, 90?, 135?). Best > shape is from 135? > > 9. Fig 9- semi-variogram fit with Gaussian Model. R code is: > > v = variogram(Z~X+Y, subground, cutoff=1800, width=200, alpha=c(135)) > > > > v.fit = fit.variogram(v, vgm(psill = 1, model="Gau", range=1800, nugget> > 0, anis=c(135, 0.5))) > > > > R code: > > > > testground2 <- > read.table(file="c:/work_LIDAR_USA/R_kriging_new_set/ground_26841492694149_ >x yz.txt", header=T) > > class (testground2) > > coordinates (testground2)=~X+Y # this makes testground a > SpatialPointsDataFrame > > class (testground2) > > str(as.data.frame(testground)) > > > > hist(testground$Z,nclass=20) #this makes a histogram > > plot(density(testground$Z)) #this makes a plot density > > boxplot(testground$Z)#this makes a boxplot > > > > samplerows<-sample(1:nrow(testground),size=10000,replace=FALSE) #select n. > points from all data-base > > subground <-testground[samplerows,] > > hist(subground$Z,nclass=20) #this makes a histogram > > plot(density(subground$Z)) #this makes a plot density > > boxplot(subground$Z)#this makes a boxplot > > spplot(subground["Z"], col.regions=bpy.colors(), at = seq(850,1170,10)) > > > > library(maptools) > > library(gstat) > > plot(variogram(Z~1, subground)) #Ordinary Kriging (without detrend) > > # if there is a trend we must use a detrend fuction Z~X+Y > > x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80)) > #Universal Kriging (with detrend) > > x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, map=T)) > > x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, > alpha=c(0, 45, 90, 135))) > > v = variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, alpha=c(135, > 45)) > > v.fit = fit.variogram(v, vgm(psill = 1, model="Gau", range=1800, nugget= 0, > anis=c(135, 0.5))) > > plot(v, v.fit, plot.nu=F, pch="+") > > # create the new grid > > new.grid <- spsample(subground, type="regular", cellsize=c(1,1)) > > gridded(new.grid) <- TRUE > > fullgrid(new.grid) <- TRUE > > new.grid at grid > > #using Universal Kriging > > subground.uk = krige(log(Z)~X+Y, subground, new.grid, v.fit, nmax=40) > #ERROR-- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341
On Tuesday 05 August 2008, Alessandro wrote:> Thank you Dylan > > Do you think Kriging method is not valid to processing a large data-set as > LiDAR data? > > About RST interpolation and NNI do you know a code in R? > > alePlease reply to the list next time. I would suggest going over the available literature on these techniques and LiDAR. None of these approaches are going to be feasible in R, when the input dataset is much larger than the available RAM. Cheers, Dylan> -----Messaggio originale----- > Da: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Per > conto di Dylan Beaudette Inviato: marted? 5 agosto 2008 14.25 > A: r-help at r-project.org > Oggetto: Re: [R] LIDAR Problem in R (THANKS for HELP) > > On Tuesday 05 August 2008, Alessandro wrote: > > Hi All, > > > > > > > > I am a PhD student in forestry science and I am working with LiDAR data > > set (huge data set). I am a brand-new in R and geostatistic (SORRY, my > > background it?s in forestry) but I wish improve my skill for improve > > myself. I wish to develop a methodology to processing a large data-set of > > points (typical in LiDAR) but there is a problem with memory. I had > > created a subsample data-base but the semivariogram is periodic shape and > > I am not to able to try a fit the model. This is a maximum of two weeks > > of work (day bay day) SORRY. Is there a geostatistical user I am very > > happy to listen his suggests. Data format is X, Y and Z (height to create > > the DEM) in txt format > > Hi, > > Remeber, R is memory-bound. Why exactly are you trying to develop a method > that will inherently demand support for massive datasets (LiDAR) within a > framework designed to work with samples (R)? > > There are a wealth of gridding techniques suitable for LiDAR data that are > not going to fail on massive datasets. A couple come to mind: > > 1. the blockmean / blockmode functions in GMT > 2. r.in.xyz in GRASS > 3. RST interpolation > 4. natural neighbor interpolation > 5 ... lots more ... > > i.e. why do you want to use something as memory/processor/model dependent > as kriging to grid such a massive and regularly-spaced set of data? > > I would suggest that the time spent trying to accomplish what you propose > might be better spent on thinking about the underlying purpose of the > experiment. Please don't take my comments as anything other that one > researcher trying to save another researcher's valuable time and effort. > > Cheers, > > Dylan > > > I have this questions: > > > > > > > > 1. After the random selection (10000 points) and fit.semivariogram > > model is it possible to use all LiDAR points? Because the new LiDAR power > > is to use huge number of points (X;Y; Z value) to create a very high > > resolution map of DEM and VEGETATION. The problem is the memory, but we > > can use a cluster-linux network to improve the capacity of R > > > > > > > > 2. Is it possible to improve the memory capacity? > > > > > > > > 3. The data has a trend and the qqplot shows a Gaussian trend. Is > > it possible to normalize the data (i.e. with log)? > > > > > > > > 4. When I use this R code ?subground.uk = krige(log(Z)~X+Y, > > subground, new.grid, v.fit, nmax=40)? to appear an Error massage: Error > > in eval(expr, envir, enclos) : oggetto "X" non trovato > > > > > > > > I send you a report and attach the image to explain better. > > > > > > > > all procedure is write in R-software and to improve in gstat . The number > > of points of GROUND data-set (4x2 km) is 5,459,916.00. The random sub- > > set from original data-set is 10000 (R code is: > samplerows > > <-sample(1:nrow(testground),size=10000,replace=FALSE) > subground > > <-testground[samplerows,]) > > > > > > > > 1. Fig 1 - Original data-set Histogram: show two populations; > > > > 2. Fig 2 ? original data-set density plot: show again two > > populations of data; > > > > 3. Fig 3 ?Original data-set Boxplot: show there aren?t outlayers > > un the data-set (the classification with terrascan is good and therefore > > there isn?t a problem with original data) > > > > 4. Fig 4 - plot random data-set in the space > > > > 5. Fig 5 ? ordinary kriging: show a trend in the space > > (hypothesis: the points are very close in the space) > > > > 6. Fig 6 ?de-trend dataset with: v <- variogram (log(Z)~X+Y, > > subground, cutoff=1800, width=100)) > > > > 7. Fig 7 ? map of semi-variogram: show an anisotropy in the space > > (0? is Nord= 135? major radius 45? minus radius) > > > > 8. Fig 8- semi-variogram with anisotropy (0?, 45?, 90?, 135?). > > Best shape is from 135? > > > > 9. Fig 9- semi-variogram fit with Gaussian Model. R code is: > > > v = variogram(Z~X+Y, subground, cutoff=1800, width=200, alpha=c(135)) > > > > > > v.fit = fit.variogram(v, vgm(psill = 1, model="Gau", range=1800, > > > nugget> > > > 0, anis=c(135, 0.5))) > > > > > > > > R code: > > > > > > > > testground2 <- > > read.table(file="c:/work_LIDAR_USA/R_kriging_new_set/ground_2684149269414 > >9_ x yz.txt", header=T) > > > > class (testground2) > > > > coordinates (testground2)=~X+Y # this makes testground a > > SpatialPointsDataFrame > > > > class (testground2) > > > > str(as.data.frame(testground)) > > > > > > > > hist(testground$Z,nclass=20) #this makes a histogram > > > > plot(density(testground$Z)) #this makes a plot density > > > > boxplot(testground$Z)#this makes a boxplot > > > > > > > > samplerows<-sample(1:nrow(testground),size=10000,replace=FALSE) #select > > n. points from all data-base > > > > subground <-testground[samplerows,] > > > > hist(subground$Z,nclass=20) #this makes a histogram > > > > plot(density(subground$Z)) #this makes a plot density > > > > boxplot(subground$Z)#this makes a boxplot > > > > spplot(subground["Z"], col.regions=bpy.colors(), at = seq(850,1170,10)) > > > > > > > > library(maptools) > > > > library(gstat) > > > > plot(variogram(Z~1, subground)) #Ordinary Kriging (without detrend) > > > > # if there is a trend we must use a detrend fuction Z~X+Y > > > > x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80)) > > #Universal Kriging (with detrend) > > > > x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, > > map=T)) > > > > x11(); plot(variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, > > alpha=c(0, 45, 90, 135))) > > > > v = variogram(log(Z)~X+Y, subground, cutoff=1800, width=80, alpha=c(135, > > 45)) > > > > v.fit = fit.variogram(v, vgm(psill = 1, model="Gau", range=1800, nugget> > 0, anis=c(135, 0.5))) > > > > plot(v, v.fit, plot.nu=F, pch="+") > > > > # create the new grid > > > > new.grid <- spsample(subground, type="regular", cellsize=c(1,1)) > > > > gridded(new.grid) <- TRUE > > > > fullgrid(new.grid) <- TRUE > > > > new.grid at grid > > > > #using Universal Kriging > > > > subground.uk = krige(log(Z)~X+Y, subground, new.grid, v.fit, nmax=40) > > #ERROR-- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341