Dear Professor Lumley
I am relatively new to using R and also to logistic regression. We have
analysed our Dudley Health Survey using the survey package. I am now
trying to look at associations using svyglm but I am unsure of how to
interpret the output and present the resulting model or whether there
are any other things I should do to check the validity of the model.
Below is an example of what I have completed so far. All of my factors
are numbered 1,2,3 etc instead of 0,1,2,etc - is this a problem?
Q14sum is Eating >= 5 portions fruit & veg per day vs eating less
Q16-q19 are dietary behaviours categorised into good, ok, bad
svyglm(q14sum ~ q16x + q17ax + q17bx + q17cx + q17dx + q17ex +
q17fx + q18x + q19x + heavy + binge + smoksum + q41sum +
q46 + exsum + q53 + q55 + q61 + ethnicgrp + pcs_sum + mcs_sum,
design = dudleyls1design, family = binomial)
Survey design:
svydesign(id = ~1, probs = ~prob1, strata = ~strat1, fpc = ~pop1,
data = dudleyls1)
Estimate Std. Error t value
(Intercept) -0.929963 0.286259 -3.249
0.001170 **
q16xOK 0.433945 0.086506 5.016
5.51e-07 ***
q16xBad 0.615792 0.227680 2.705
0.006868 **
q17axOK -0.754625 0.205562 -3.671
0.000245 ***
q17axBad 0.308037 0.085643 3.597
0.000326 ***
q17bxOK 0.065208 0.084367 0.773
q17bxBad 0.038653 0.105495 0.366
q17cxOK 0.554372 0.097740 5.672
1.52e-08 ***
q17cxBad 0.755530 0.158943 4.753
2.07e-06 ***
q17dxOK 0.196032 0.089092 2.200
0.027844 *
q17dxBad -0.006027 0.346917 -0.017
q17exBad 0.649370 0.075907 8.555 <
2e-16 ***
q17fxOK 0.764552 0.092198 8.293 <
2e-16 ***
q17fxBad 1.345113 0.181593 7.407
1.58e-13 ***
q18xOK 0.074950 0.089852 0.834
q19xOK 0.080513 0.091597 0.879
q19xBad 0.213237 0.097400 2.189
0.028637 *
heavyHeavy drinkers (>3 / >4) 0.324062 0.111947 2.895
0.003816 **
bingeBinge drinkers (6+ / 8+) 0.104460 0.136534 0.765
smoksumNon-smokers -0.562577 0.111547 -5.043
4.79e-07 ***
q41sumNo accident in past 12 months 0.120413 0.087097 1.383
q46Very good 0.485598 0.174487 2.783
0.005412 **
q46Good 0.708913 0.182242 3.890
0.000102 ***
q46Fair 0.996615 0.215941 4.615
4.06e-06 ***
q46Poor 0.861350 0.282816 3.046
0.002338 **
exsumNot enough exercise 0.248127 0.076451 3.246
0.001182 **
q53No 0.233167 0.200720 1.162
q53Don't know 0.297236 0.309616 0.960
q55No 0.171200 0.200076 0.856
q55Don't know/It depends 0.274091 0.199006 1.377
q61No 0.155685 0.113807 1.368
ethnicgrpOther 1.101213 0.281635 3.910
9.39e-05 ***
pcs_sumMiddle two quartiles 0.074597 0.124994 0.597
pcs_sumTop quartile score 0.109348 0.161871 0.676
mcs_sumMiddle two quartiles -0.077917 0.107135 -0.727
mcs_sumTop quartile score -0.245545 0.129996 -1.889
0.058987 .
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 0.5965213)
Number of Fisher Scoring iterations: 5
There were 24 warnings (use warnings() to see them)
svyglm(q14sum ~ q16x + q17ax + q17cx + q17dx + q17ex + q17fx +
q19x + heavy + smoksum + q46 + exsum + ethnicgrp, design dudleyls1design,
family = binomial)
Survey design:
svydesign(id = ~1, probs = ~prob1, strata = ~strat1, fpc = ~pop1,
data = dudleyls1)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.70031 0.19458 -3.599 0.000323 ***
q16xOK 0.49963 0.07974 6.265 4.06e-10 ***
q16xBad 0.57798 0.19554 2.956 0.003134 **
q17axOK -0.64888 0.18319 -3.542 0.000401 ***
q17axBad 0.30817 0.07937 3.883 0.000105 ***
q17cxOK 0.50542 0.08794 5.747 9.66e-09 ***
q17cxBad 0.77768 0.14523 5.355 8.99e-08 ***
q17dxOK 0.17925 0.08222 2.180 0.029313 *
q17dxBad 0.08186 0.33318 0.246 0.805924
q17exBad 0.70146 0.07022 9.989 < 2e-16 ***
q17fxOK 0.73539 0.08525 8.626 < 2e-16 ***
q17fxBad 1.35306 0.16878 8.017 1.37e-15 ***
q19xOK 0.13074 0.08431 1.551 0.121030
q19xBad 0.33539 0.08956 3.745 0.000183 ***
heavyHeavy drinkers (>3 / >4) 0.40715 0.07698 5.289 1.29e-07 ***
smoksumNon-smokers -0.61383 0.10289 -5.966 2.62e-09 ***
q46Very good 0.49491 0.16297 3.037 0.002404 **
q46Good 0.67543 0.15889 4.251 2.17e-05 ***
q46Fair 0.93525 0.17103 5.468 4.79e-08 ***
q46Poor 0.82608 0.21400 3.860 0.000115 ***
exsumNot enough exercise 0.27654 0.07021 3.939 8.31e-05 ***
ethnicgrpOther 1.00241 0.23850 4.203 2.69e-05 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 0.7067328)
Number of Fisher Scoring iterations: 5
I am aware that if independent factors are highly correlated that this
could have a large impact on the validity of the model - so how can I
test for this?. How can I identify the accuracy of the model should I
look at hit rates and if so how? Can you present the outcome of the
model graphically?
I would appreciate any help you could provide.
Many thanks
Yours faithfully
Angela Moss
Dr Angela Moss
Public Health Information Analyst
Dudley PCT
St. John's House
Union Street
Tel: 01384 366091
Fax: 01384 366485
[[alternative HTML version deleted]]