I use the spsurvey package a decent amount. The cont.cdftest function bins the
cdf in order to perform the test which I think is the root of the problem.
Unfortunately, the default is 3 which is the minimum number of bins.
I would contact Tom Kincaid or Tony Olsen at NHEERL WED directly to ask about
this problem.
Another option would be to take a different analytical approach (e.g., a mixed
effects model) which would allow you a lot more flexibility.
Jason Law
Statistician
City of Portland
Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason.law at portlandoregon.gov
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Tim Howard
Sent: Friday, November 01, 2013 7:49 AM
To: r-help at r-project.org
Subject: [R] spsurvey analysis
All,
I've used the excellent package, spsurvey, to create spatially balanced
samples many times in the past. I'm now attempting to use the analysis
portion of the package, which compares CDFs among sub-populations to test for
differences in sub-population metrics.
- My data (count data) have many zeros, following a negative binomial or even
zero-inflated negative binomial distribution.
- Samples are within polygons of varying sizes
- I want to test whether a sample at time 1 is different from a sample at time
2. Essentially the same sample areas and number of samples.
The problem:
- cont.cdftest throws a warning and does not complete for most (but not all)
species sampled. Warning message: "The combined number of values in at
least one class is less than five. Action: The user should consider using a
smaller number of classes."
- There are plenty of samples in my two time periods (the dummy set below:
Yr1=27, Yr2=31 non-zero values).
My Question:
Why is it throwing this error and is there a way to get around it?
Reproduceable example (change path to spsurvey sample data), requires us to use
spsurvey to generate sample points:
### R code tweaked from vignettes 'Area_Design' and
'Area_Analysis'
library(spsurvey)
### Analysis set up
setwd("C:/Program Files/R/R-3.0.2/library/spsurvey/doc")
att <- read.dbf("UT_ecoregions")
shp <- read.shape("UT_ecoregions")
set.seed(4447864)
# Create the design list
Stratdsgn <- list("Central Basin and
Range"=list(panel=c(PanelOne=25), seltype="Equal"),
"Colorado Plateaus"=list(panel=c(PanelOne=25),
seltype="Equal"),
"Mojave Basin and Range"=list(panel=c(PanelOne=10),
seltype="Equal"),
"Northern Basin and
Range"=list(panel=c(PanelOne=10), seltype="Equal"),
"Southern Rockies"=list(panel=c(PanelOne=14),
seltype="Equal"),
"Wasatch and Uinta
Mountains"=list(panel=c(PanelOne=10), seltype="Equal"),
"Wyoming Basin"=list(panel=c(PanelOne=6),
seltype="Equal"))
# Select the sample design for each year
Stratsites_Yr1 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
type.frame="area", src.frame="sp.object",
sp.object=shp, att.frame=att, stratum="Level3_Nam",
shapefile=FALSE)
Stratsites_Yr2 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
type.frame="area", src.frame="sp.object",
sp.object=shp, att.frame=att, stratum="Level3_Nam",
shapefile=FALSE)
#extract the core information, add year as a grouping variable, add a plot ID to
link with dummy data
Yr1 <- cbind(pltID = 1001:1100, Stratsites_Yr1 at data[,c(1,2,3,5)], grp =
"Yr1")
Yr2 <- cbind(pltID = 2001:2100, Stratsites_Yr2 at data[,c(1,2,3,5)], grp =
"Yr2")
sitedat <- rbind(Yr1, Yr2)
# create dummy sampling data. Lots of zeros!
bn.a <- rnbinom(size = 0.06, mu = 19.87, n=100) bn.b <- rnbinom(size =
0.06, mu = 20.15, n=100) dat.a <- data.frame(pltID = 1001:1100, grp =
"Yr1",count = bn.a) dat.b <- data.frame(pltID = 2001:2100, grp =
"Yr2",count = bn.b) dat <- rbind(dat.a, dat.b)
####
## Analysis begins here
####
data.cont <- data.frame(siteID = dat$pltID, Density=dat$count) sites <-
data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat))) subpop <-
data.frame(siteID = dat$pltID,
All_years=(rep("allYears",nrow(dat))),
Year = dat$grp)
design <- data.frame(siteID = sitedat$pltID,
wgt = sitedat$wgt,
xcoord = sitedat$xcoord,
ycoord = sitedat$ycoord)
framesize <- c("Yr1"=888081202000, "Yr2"=888081202000)
## There seem to be pretty good estimates CDF_Estimates <-
cont.analysis(sites, subpop, design, data.cont,
popsize = list(All_years=sum(framesize),
Year = as.list(framesize)))
print(CDF_Estimates$Pct)
## this test fails
CDF_Tests <- cont.cdftest(sites, subpop[,c(1,3)], design, data.cont,
popsize=list(Year=as.list(framesize)))
warnprnt()
## how many records have values greater than zero, by year? Probably
irrelevant!
notZero <- dat[dat$count > 0,]
table(notZero$grp)
### end
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
Thanks in advance.
Tim
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.