I have a data set to be analyzed using to binary logistic regression. The data set is iin grouped form. My question is: how I can compute Hosmer-Lemeshow test and measures like sensitivity and specificity? Any suggestion will be greatly appreciated. Thank you Endy [[alternative HTML version deleted]]
Endy,
See the package ResourceSelection for the HL test and the package caret for the
sensitivity and specificity measures.
Regards,
Jose Iparraguirre
Chief Economist
Age UK, London
________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On
Behalf Of Endy BlackEndy [pertsou at gmail.com]
Sent: 14 April 2013 19:05
To: R-Help
Subject: [R] Logistic regression
I have a data set to be analyzed using to binary logistic regression. The
data set is iin grouped form. My question is: how I can compute
Hosmer-Lemeshow test and measures like sensitivity and specificity? Any
suggestion will be greatly appreciated.
Thank you
Endy
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Please donate to the Syria Crisis Appeal by text or online:
To donate ?5 by mobile, text SYRIA to 70800. To donate online, please visit
http://www.ageinternational.org.uk/syria
Over one million refugees are desperately in need of water, food, healthcare,
warm clothing,
blankets and shelter; Age International urgently needs your support to help
affected older refugees.
Age International is a subsidiary charity of Age UK and a member of the
Disasters Emergency Committee (DEC).
The DEC launches and co-ordinates national fundraising appeals for public
donations on behalf of its member agencies.
Texts cost ?5 plus one standard rate message. Age International will receive a
minimum of ?4.96.
More info at ageinternational.org.uk/SyriaTerms
-------------------------------
Age UK is a registered charity and company limited by guarantee, (registered
charity number 1128267, registered company number 6825798).
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.
For the purposes of promoting Age UK Insurance, Age UK is an Appointed
Representative of Age UK Enterprises Limited, Age UK is an Introducer
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth
Access for the purposes of introducing potential annuity and health
cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit
Solutions Limited and Simplyhealth Access are all authorised and
regulated by the Financial Services Authority.
------------------------------
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are
addressed. If you receive a message in error, please advise the sender and
delete immediately.
Except where this email is sent in the usual course of our business, any
opinions expressed in this email are those of the author and do not
necessarily reflect the opinions of Age UK or its subsidiaries and associated
companies. Age UK monitors all e-mail transmissions passing
through its network and may block or modify mails which are deemed to be
unsuitable.
Age Concern England (charity number 261794) and Help the Aged (charity number
272786) and their trading and other associated companies merged
on 1st April 2009. Together they have formed the Age UK Group, dedicated to
improving the lives of people in later life. The three national
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help
the Aged in these nations to form three registered charities:
Age Scotland, Age NI, Age Cymru.
Dear colleagues I have a couple of problems related with binary logistic
regression.
The first problem is how to compute Pearson and Likelihood chi-squeared
tests for grouped data.
For the same form of data set how to compute sensitivity, specificity and
related measures.
When I speak about grouped data I mean data of the following form
Alcohol.Consum Malformed NoMalformed
0 48 17066
<1 38 14464
1-2 5 788
3-5 1 126
>=6 1 37
(This data set has been taken from the book “Categorical data analysis by
Agresti”)
The second question is the following:is it correct to upload a grouped data
set to an ungrouped one?
The upload is achieved with the aid of the following routine
Createdataframe <- function(d){
f1 = f2 = numeric(0)
v = d[,1]
for(j in 2:3){
for(i in 1:dim(d)[1]){
f1 = c( f1, rep( levels(v)[v[i]], d[i,j] ) )
}
f2 = c( f2, rep( 3-j, sum(d[,j]) ) )
}
df=data.frame(f1,f2)
return(df)
}
My finally question is why SPSS computes the Hosmer-Lemeshow test while the
routine listed below gives NaN. (One has to put g=8 in order to get a
numeric value)
The data set is (it is also has been taken from the same book mentioned
above)
FlightNo Temp ThermalDisast
1 66 0
2 70 1
3 69 0
4 68 0
5 67 0
6 72 0
7 73 0
8 70 0
9 57 1
10 63 1
11 70 1
12 78 0
13 67 0
14 53 1
15 67 0
16 75 0
17 70 0
18 81 0
19 76 0
20 79 0
21 75 1
22 76 0
23 58 1
and the routine used is
hosmerlemeshow = function(obj, g=10) {
# first, check to see if we fed in the right kind of object
stopifnot(family(obj)$family=="binomial" &&
family(obj)$link=="logit")
y = obj$model[[1]]
# the double bracket (above) gets the index of items within an object
if (is.factor(y))
y = as.numeric(y)==2
yhat = obj$fitted.values
cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE)
obs = xtabs(cbind(1 - y, y) ~ cutyhat)
expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
if (any(expect < 5))
# warning("Some expected counts are less than 5. Use smaller number
of groups")
chisq = sum((obs - expect)^2/expect)
P = 1 - pchisq(chisq, g - 2)
cat("H-L2 statistic",round(chisq,4)," and its p
value",P,"\n")
# by returning an object of class "htest", the function will
perform like the
# built-in hypothesis tests
return(structure(list(
method = c(paste("Hosmer and Lemeshow goodness-of-fit test
with", g,
"bins", sep=" ")),
data.name = deparse(substitute(obj)),
statistic = c(X2=chisq),
parameter = c(df=g-2),
p.value = P
), class='htest'))
return(list(chisq=chisq,p.value=P))
}
I found this routine in the internet.
Thank you for your cooperation in advance. Any suggestion and/or solution
will be greatly appreciated.
With regards
Endy
[[alternative HTML version deleted]]
Please read the attach file. Thank you Endy
On Apr 19, 2013, at 11:45 AM, Endy BlackEndy wrote:> Please read the attach file.Please re-read : http://www.r-project.org/mail.html#instructions> Thank you > Endy > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Dear colleagues I have a couple of problems related with binary logistic
regression.
The first problem is how to compute Pearson and Likelihood chi-squeared
tests for grouped data.
For the same form of data set how to compute sensitivity, specificity and
related measures.
When I speak about grouped data I mean data of the following form
Alcohol.Consum Malformed NoMalformed
0
48 17066
<1
38 14464
1-2
5 788
3-5
1 126
>=6
1 37
(This data set has been taken from the book “Categorical data analysis by
Agresti”)
The second question is the following:is it correct to upload a grouped
data set to an ungrouped one?
The upload is achieved with the aid of the following routine
Createdataframe <- function(d){
f1 = f2 = numeric(0)
v = d[,1]
for(j in 2:3){
for(i in 1:dim(d)[1]){
f1 = c( f1, rep( levels(v)[v[i]], d[i,j] ) )
}
f2 = c( f2, rep( 3-j, sum(d[,j]) ) )
}
df=data.frame(f1,f2)
return(df)
}
My finally question is why SPSS computes the Hosmer-Lemeshow test while
the routine listed below gives NaN. (One has to put g=8 in order to get a
numeric value)
The data set is (it is also has been taken from the same book mentioned
above)
FlightNo Temp ThermalDisast
1 66 0
2 70 1
3 69 0
4 68 0
5 67 0
6 72 0
7 73 0
8 70 0
9 57 1
10 63 1
11 70 1
12 78 0
13 67 0
14 53 1
15 67 0
16 75 0
17 70 0
18 81 0
19 76 0
20 79 0
21 75 1
22 76 0
23 58 1
and the routine used is
hosmerlemeshow = function(obj, g=10) {
# first, check to see if we fed in the right kind of object
stopifnot(family(obj)$family=="binomial" &&
family(obj)$link=="logit")
y = obj$model[[1]]
# the double bracket (above) gets the index of items within an object
if (is.factor(y))
y = as.numeric(y)==2
yhat = obj$fitted.values
cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE)
obs = xtabs(cbind(1 - y, y) ~ cutyhat)
expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
if (any(expect < 5))
# warning("Some expected counts are less than 5. Use smaller number of
groups")
chisq = sum((obs - expect)^2/expect)
P = 1 - pchisq(chisq, g - 2)
cat("H-L2 statistic",round(chisq,4)," and its p
value",P,"\n")
# by returning an object of class "htest", the function will
perform
like the
# built-in hypothesis tests
return(structure(list(
method = c(paste("Hosmer and Lemeshow goodness-of-fit test
with", g,
"bins", sep=" ")),
data.name = deparse(substitute(obj)),
statistic = c(X2=chisq),
parameter = c(df=g-2),
p.value = P
), class='htest'))
return(list(chisq=chisq,p.value=P))
}
Thank you for your cooperation in advance. Any suggestion and/or solution
will be greatly appreciated.
With regards
Endy
[[alternative HTML version deleted]]