Dear Professor Lumley
I am relatively new to using R and also to logistic regression. We have
analysed our Dudley Health Survey using the survey package. I am now
trying to look at associations using svyglm but I am unsure of how to
interpret the output and present the resulting model or whether there
are any other things I should do to check the validity of the model.
Below is an example of what I have completed so far. All of my factors
are numbered 1,2,3 etc instead of 0,1,2,etc - is this a problem?
Q14sum is Eating >= 5 portions fruit & veg per day vs eating less
Q16-q19 are dietary behaviours categorised into good, ok, bad
summary(svyglm(q14sum~q16x+q17ax+q17bx+q17cx+q17dx+q17ex+q17fx+q18x+q19x
+heavy+binge+smoksum+q41sum+q46+exsum+q53+q55+q61+ethnicgrp+pcs_sum+mcs_
sum,design=dudleyls1design,family=binomial))
Call:
svyglm(q14sum ~ q16x + q17ax + q17bx + q17cx + q17dx + q17ex +
q17fx + q18x + q19x + heavy + binge + smoksum + q41sum +
q46 + exsum + q53 + q55 + q61 + ethnicgrp + pcs_sum + mcs_sum,
design = dudleyls1design, family = binomial)
Survey design:
svydesign(id = ~1, probs = ~prob1, strata = ~strat1, fpc = ~pop1,
data = dudleyls1)
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) -0.929963 0.286259 -3.249
0.001170 **
q16xOK 0.433945 0.086506 5.016
5.51e-07 ***
q16xBad 0.615792 0.227680 2.705
0.006868 **
q17axOK -0.754625 0.205562 -3.671
0.000245 ***
q17axBad 0.308037 0.085643 3.597
0.000326 ***
q17bxOK 0.065208 0.084367 0.773
0.439627
q17bxBad 0.038653 0.105495 0.366
0.714093
q17cxOK 0.554372 0.097740 5.672
1.52e-08 ***
q17cxBad 0.755530 0.158943 4.753
2.07e-06 ***
q17dxOK 0.196032 0.089092 2.200
0.027844 *
q17dxBad -0.006027 0.346917 -0.017
0.986140
q17exBad 0.649370 0.075907 8.555 <
2e-16 ***
q17fxOK 0.764552 0.092198 8.293 <
2e-16 ***
q17fxBad 1.345113 0.181593 7.407
1.58e-13 ***
q18xOK 0.074950 0.089852 0.834
0.404252
q19xOK 0.080513 0.091597 0.879
0.379461
q19xBad 0.213237 0.097400 2.189
0.028637 *
heavyHeavy drinkers (>3 / >4) 0.324062 0.111947 2.895
0.003816 **
bingeBinge drinkers (6+ / 8+) 0.104460 0.136534 0.765
0.444270
smoksumNon-smokers -0.562577 0.111547 -5.043
4.79e-07 ***
q41sumNo accident in past 12 months 0.120413 0.087097 1.383
0.166895
q46Very good 0.485598 0.174487 2.783
0.005412 **
q46Good 0.708913 0.182242 3.890
0.000102 ***
q46Fair 0.996615 0.215941 4.615
4.06e-06 ***
q46Poor 0.861350 0.282816 3.046
0.002338 **
exsumNot enough exercise 0.248127 0.076451 3.246
0.001182 **
q53No 0.233167 0.200720 1.162
0.245450
q53Don't know 0.297236 0.309616 0.960
0.337109
q55No 0.171200 0.200076 0.856
0.392231
q55Don't know/It depends 0.274091 0.199006 1.377
0.168500
q61No 0.155685 0.113807 1.368
0.171400
ethnicgrpOther 1.101213 0.281635 3.910
9.39e-05 ***
pcs_sumMiddle two quartiles 0.074597 0.124994 0.597
0.550672
pcs_sumTop quartile score 0.109348 0.161871 0.676
0.499384
mcs_sumMiddle two quartiles -0.077917 0.107135 -0.727
0.467100
mcs_sumTop quartile score -0.245545 0.129996 -1.889
0.058987 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 0.5965213)
Number of Fisher Scoring iterations: 5
There were 24 warnings (use warnings() to see them)
>
summary(svyglm(q14sum~q16x+q17ax+q17cx+q17dx+q17ex+q17fx+q19x+heavy+smok
sum+q46+exsum+ethnicgrp,design=dudleyls1design,family=binomial))
Call:
svyglm(q14sum ~ q16x + q17ax + q17cx + q17dx + q17ex + q17fx +
q19x + heavy + smoksum + q46 + exsum + ethnicgrp, design dudleyls1design,
family = binomial)
Survey design:
svydesign(id = ~1, probs = ~prob1, strata = ~strat1, fpc = ~pop1,
data = dudleyls1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.70031 0.19458 -3.599 0.000323 ***
q16xOK 0.49963 0.07974 6.265 4.06e-10 ***
q16xBad 0.57798 0.19554 2.956 0.003134 **
q17axOK -0.64888 0.18319 -3.542 0.000401 ***
q17axBad 0.30817 0.07937 3.883 0.000105 ***
q17cxOK 0.50542 0.08794 5.747 9.66e-09 ***
q17cxBad 0.77768 0.14523 5.355 8.99e-08 ***
q17dxOK 0.17925 0.08222 2.180 0.029313 *
q17dxBad 0.08186 0.33318 0.246 0.805924
q17exBad 0.70146 0.07022 9.989 < 2e-16 ***
q17fxOK 0.73539 0.08525 8.626 < 2e-16 ***
q17fxBad 1.35306 0.16878 8.017 1.37e-15 ***
q19xOK 0.13074 0.08431 1.551 0.121030
q19xBad 0.33539 0.08956 3.745 0.000183 ***
heavyHeavy drinkers (>3 / >4) 0.40715 0.07698 5.289 1.29e-07 ***
smoksumNon-smokers -0.61383 0.10289 -5.966 2.62e-09 ***
q46Very good 0.49491 0.16297 3.037 0.002404 **
q46Good 0.67543 0.15889 4.251 2.17e-05 ***
q46Fair 0.93525 0.17103 5.468 4.79e-08 ***
q46Poor 0.82608 0.21400 3.860 0.000115 ***
exsumNot enough exercise 0.27654 0.07021 3.939 8.31e-05 ***
ethnicgrpOther 1.00241 0.23850 4.203 2.69e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 0.7067328)
Number of Fisher Scoring iterations: 5
I am aware that if independent factors are highly correlated that this
could have a large impact on the validity of the model - so how can I
test for this?. How can I identify the accuracy of the model should I
look at hit rates and if so how? Can you present the outcome of the
model graphically?
I would appreciate any help you could provide.
Many thanks
Yours faithfully
Angela Moss
Dr Angela Moss
Public Health Information Analyst
Dudley PCT
St. John's House
Union Street
Dudley
DY2 8PP
Tel: 01384 366091
Fax: 01384 366485
[[alternative HTML version deleted]]