Courtney Benjamin
2016-Nov-17 03:15 UTC
[R] Archer-Lemeshow Goodness of Fit Test for Survey Data with Log. Regression
?Hello R Experts,
I am trying to implement the Archer-Lemeshow GOF Test for survey data on a
logistic regression model using the survey package based upon an R Help Archive
post that I found where Dr. Thomas Lumley advised how to do it:
http://r.789695.n4.nabble.com/Goodness-of-t-tests-for-Complex-Survey-Logistic-Regression-td4668233.html
Everything is going well until I get to the point where I have to add the
objects 'r' and 'g' as variables to the data frame by either
using the transform function or the update function to update the svrepdesign
object. The log. regression model involved uses a subset of data and some of
the values in the data frame are NA, so that is affecting my ability to add
'r' and 'g' as variables; I am getting an error because I only
have 8397 rows for the new variables and 16197 in the data frame and svrepdesign
object. I am not sure how to overcome this error.
The following is a MRE:
##Archer Lemeshow Goodness of Fit Test for Complex Survey Data with Logistic
Regression
library(RCurl)
library(survey)
data <-
getURL("https://raw.githubusercontent.com/cbenjamin1821/careertech-ed/master/elsq1adj.csv")
elsq1ch <- read.csv(text = data)
#Specifying the svyrepdesign object which applies the BRR weights
elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights =
elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
"BRR")
elsq1ch_brr
##Resetting baseline levels for predictors
elsq1ch_brr <- update( elsq1ch_brr , F1HIMATH = relevel(F1HIMATH,"PreAlg
or Less") )
elsq1ch_brr <- update( elsq1ch_brr , BYINCOME =
relevel(BYINCOME,"0-25K") )
elsq1ch_brr <- update( elsq1ch_brr , F1RACE =
relevel(F1RACE,"White") )
elsq1ch_brr <- update( elsq1ch_brr , F1SEX = relevel(F1SEX,"Male")
)
elsq1ch_brr <- update( elsq1ch_brr , F1RTRCC =
relevel(F1RTRCC,"Academic") )
#Log. Reg. model-all curric. concentrations including F1RTRCC as a predictor
allCC <-
svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.omit)
summary(allCC)
#Recommendations from Lumley (from R Help Archive) on implementing the Archer
Lemeshow GOF test
r <- residuals(allCC, type="response")
f<-fitted(allCC)
g<- cut(f, c(-Inf, quantile(f, (1:9)/10, Inf)))
# now create a new design object with r and g added as variables
#This is the area where I am having problems as my model involves a subset and
some values are NA as well
#I am also not sure if I am naming/specifying the new variables of r and g
properly
transform(elsq1ch,r=r,g=g)
elsq1ch_brr <- update(elsq1ch_brr,tag=g,tag=r)
#then:
decilemodel<- svyglm(r~g, design=newdesign)
regTermTest(decilemodel, ~g)
#is the F-adjusted mean residual test from the Archer Lemeshow paper
Thank you,
Courtney
?
Courtney Benjamin
Broome-Tioga BOCES
Automotive Technology II Teacher
Located at Gault Toyota
Doctoral Candidate-Educational Theory & Practice
State University of New York at Binghamton
cbenjami at btboces.org<mailto:cbenjami at btboces.org>
607-763-8633
[[alternative HTML version deleted]]
Anthony Damico
2016-Nov-17 09:28 UTC
[R] Archer-Lemeshow Goodness of Fit Test for Survey Data with Log. Regression
great minimal reproducible example, thanks. does something like this work? #Log. Reg. model-all curric. concentrations including F1RTRCC as a predictor allCC <- svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.exclude) summary(allCC) r <- residuals(allCC, type="response") f<-fitted(allCC) your_g<- cut(f, c(-Inf, quantile(f, (1:9)/10, Inf))) elsq1ch[ elsq1ch$BYSCTRL==1&elsq1ch$G10COHRT==1 , 'your_g' ] <- your_g elsq1ch[ elsq1ch$BYSCTRL==1&elsq1ch$G10COHRT==1 , 'r' ] <- r newdesign<-svrepdesign(variables = elsq1ch, repweights = elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type = "BRR") decilemodel<- svyglm(r~your_g, design=newdesign,subset=BYSCTRL==1&G10COHRT==1) regTermTest(decilemodel, ~your_g) On Wed, Nov 16, 2016 at 10:15 PM, Courtney Benjamin <cbenjami at btboces.org> wrote:> ?Hello R Experts, > > I am trying to implement the Archer-Lemeshow GOF Test for survey data on a > logistic regression model using the survey package based upon an R Help > Archive post that I found where Dr. Thomas Lumley advised how to do it: > http://r.789695.n4.nabble.com/Goodness-of-t-tests-for- > Complex-Survey-Logistic-Regression-td4668233.html > > Everything is going well until I get to the point where I have to add the > objects 'r' and 'g' as variables to the data frame by either using the > transform function or the update function to update the svrepdesign > object. The log. regression model involved uses a subset of data and some > of the values in the data frame are NA, so that is affecting my ability to > add 'r' and 'g' as variables; I am getting an error because I only have > 8397 rows for the new variables and 16197 in the data frame and svrepdesign > object. I am not sure how to overcome this error. > > The following is a MRE: > > ##Archer Lemeshow Goodness of Fit Test for Complex Survey Data with > Logistic Regression > > library(RCurl) > library(survey) > > data <- getURL("https://raw.githubusercontent.com/ > cbenjamin1821/careertech-ed/master/elsq1adj.csv") > elsq1ch <- read.csv(text = data) > > #Specifying the svyrepdesign object which applies the BRR weights > elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights > elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type > "BRR") > elsq1ch_brr > > ##Resetting baseline levels for predictors > elsq1ch_brr <- update( elsq1ch_brr , F1HIMATH = relevel(F1HIMATH,"PreAlg > or Less") ) > elsq1ch_brr <- update( elsq1ch_brr , BYINCOME = relevel(BYINCOME,"0-25K") ) > elsq1ch_brr <- update( elsq1ch_brr , F1RACE = relevel(F1RACE,"White") ) > elsq1ch_brr <- update( elsq1ch_brr , F1SEX = relevel(F1SEX,"Male") ) > elsq1ch_brr <- update( elsq1ch_brr , F1RTRCC = relevel(F1RTRCC,"Academic") > ) > > #Log. Reg. model-all curric. concentrations including F1RTRCC as a > predictor > allCC <- svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+ > F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design> elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.omit) > summary(allCC) > > #Recommendations from Lumley (from R Help Archive) on implementing the > Archer Lemeshow GOF test > r <- residuals(allCC, type="response") > f<-fitted(allCC) > g<- cut(f, c(-Inf, quantile(f, (1:9)/10, Inf))) > > # now create a new design object with r and g added as variables > #This is the area where I am having problems as my model involves a subset > and some values are NA as well > #I am also not sure if I am naming/specifying the new variables of r and g > properly > transform(elsq1ch,r=r,g=g) > elsq1ch_brr <- update(elsq1ch_brr,tag=g,tag=r) > #then: > decilemodel<- svyglm(r~g, design=newdesign) > regTermTest(decilemodel, ~g) > #is the F-adjusted mean residual test from the Archer Lemeshow paper > > Thank you, > Courtney > > ? > > Courtney Benjamin > > Broome-Tioga BOCES > > Automotive Technology II Teacher > > Located at Gault Toyota > > Doctoral Candidate-Educational Theory & Practice > > State University of New York at Binghamton > > cbenjami at btboces.org<mailto:cbenjami at btboces.org> > > 607-763-8633 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Courtney Benjamin
2016-Nov-18 10:06 UTC
[R] Archer-Lemeshow Goodness of Fit Test for Survey Data with Log. Regression
?Thank you, Anthony. Your approach does work; however, I am concerned to some
degree about subsetting the data prior to creating the new svrepdesign as I know
that it is not recommended to subset the data prior to creating the svrepdesign
object. I am not sure if this is a significant concern in this context as the
model was fitted with the original svrepdesign that was created prior to
subsetting any data and the new svrepdesign is being used to run the diagnostic
for the model. Any thoughts on that issue?
Also, from my understanding of the outcome of the diagnostic, low p values
indicate a poor model fit.
Sincerely,
Courtney
Courtney Benjamin
Broome-Tioga BOCES
Automotive Technology II Teacher
Located at Gault Toyota
Doctoral Candidate-Educational Theory & Practice
State University of New York at Binghamton
cbenjami at btboces.org<mailto:cbenjami at btboces.org>
607-763-8633
________________________________
From: Anthony Damico <ajdamico at gmail.com>
Sent: Thursday, November 17, 2016 4:28 AM
To: Courtney Benjamin
Cc: r-help at r-project.org
Subject: Re: [R] Archer-Lemeshow Goodness of Fit Test for Survey Data with Log.
Regression
great minimal reproducible example, thanks. does something like this work?
#Log. Reg. model-all curric. concentrations including F1RTRCC as a predictor
allCC <-
svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.exclude)
summary(allCC)
r <- residuals(allCC, type="response")
f<-fitted(allCC)
your_g<- cut(f, c(-Inf, quantile(f, (1:9)/10, Inf)))
elsq1ch[ elsq1ch$BYSCTRL==1&elsq1ch$G10COHRT==1 , 'your_g' ] <-
your_g
elsq1ch[ elsq1ch$BYSCTRL==1&elsq1ch$G10COHRT==1 , 'r' ] <- r
newdesign<-svrepdesign(variables = elsq1ch, repweights = elsq1ch[,18:217],
weights = elsq1ch[,17], combined.weights = TRUE, type = "BRR")
decilemodel<- svyglm(r~your_g,
design=newdesign,subset=BYSCTRL==1&G10COHRT==1)
regTermTest(decilemodel, ~your_g)
On Wed, Nov 16, 2016 at 10:15 PM, Courtney Benjamin <cbenjami at
btboces.org<mailto:cbenjami at btboces.org>> wrote:
?Hello R Experts,
I am trying to implement the Archer-Lemeshow GOF Test for survey data on a
logistic regression model using the survey package based upon an R Help Archive
post that I found where Dr. Thomas Lumley advised how to do it:
http://r.789695.n4.nabble.com/Goodness-of-t-tests-for-Complex-Survey-Logistic-Regression-td4668233.html
Everything is going well until I get to the point where I have to add the
objects 'r' and 'g' as variables to the data frame by either
using the transform function or the update function to update the svrepdesign
object. The log. regression model involved uses a subset of data and some of
the values in the data frame are NA, so that is affecting my ability to add
'r' and 'g' as variables; I am getting an error because I only
have 8397 rows for the new variables and 16197 in the data frame and svrepdesign
object. I am not sure how to overcome this error.
The following is a MRE:
##Archer Lemeshow Goodness of Fit Test for Complex Survey Data with Logistic
Regression
library(RCurl)
library(survey)
data <-
getURL("https://raw.githubusercontent.com/cbenjamin1821/careertech-ed/master/elsq1adj.csv")
elsq1ch <- read.csv(text = data)
#Specifying the svyrepdesign object which applies the BRR weights
elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights =
elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
"BRR")
elsq1ch_brr
##Resetting baseline levels for predictors
elsq1ch_brr <- update( elsq1ch_brr , F1HIMATH = relevel(F1HIMATH,"PreAlg
or Less") )
elsq1ch_brr <- update( elsq1ch_brr , BYINCOME =
relevel(BYINCOME,"0-25K") )
elsq1ch_brr <- update( elsq1ch_brr , F1RACE =
relevel(F1RACE,"White") )
elsq1ch_brr <- update( elsq1ch_brr , F1SEX = relevel(F1SEX,"Male")
)
elsq1ch_brr <- update( elsq1ch_brr , F1RTRCC =
relevel(F1RTRCC,"Academic") )
#Log. Reg. model-all curric. concentrations including F1RTRCC as a predictor
allCC <-
svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.omit)
summary(allCC)
#Recommendations from Lumley (from R Help Archive) on implementing the Archer
Lemeshow GOF test
r <- residuals(allCC, type="response")
f<-fitted(allCC)
g<- cut(f, c(-Inf, quantile(f, (1:9)/10, Inf)))
# now create a new design object with r and g added as variables
#This is the area where I am having problems as my model involves a subset and
some values are NA as well
#I am also not sure if I am naming/specifying the new variables of r and g
properly
transform(elsq1ch,r=r,g=g)
elsq1ch_brr <- update(elsq1ch_brr,tag=g,tag=r)
#then:
decilemodel<- svyglm(r~g, design=newdesign)
regTermTest(decilemodel, ~g)
#is the F-adjusted mean residual test from the Archer Lemeshow paper
Thank you,
Courtney
?
Courtney Benjamin
Broome-Tioga BOCES
Automotive Technology II Teacher
Located at Gault Toyota
Doctoral Candidate-Educational Theory & Practice
State University of New York at Binghamton
cbenjami at btboces.org<mailto:cbenjami at btboces.org><mailto:cbenjami
at btboces.org<mailto:cbenjami at btboces.org>>
607-763-8633<tel:607-763-8633>
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]