I have a data set to be analyzed using to binary logistic regression. The data set is iin grouped form. My question is: how I can compute Hosmer-Lemeshow test and measures like sensitivity and specificity? Any suggestion will be greatly appreciated. Thank you Endy [[alternative HTML version deleted]]
Endy, See the package ResourceSelection for the HL test and the package caret for the sensitivity and specificity measures. Regards, Jose Iparraguirre Chief Economist Age UK, London ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Endy BlackEndy [pertsou at gmail.com] Sent: 14 April 2013 19:05 To: R-Help Subject: [R] Logistic regression I have a data set to be analyzed using to binary logistic regression. The data set is iin grouped form. My question is: how I can compute Hosmer-Lemeshow test and measures like sensitivity and specificity? Any suggestion will be greatly appreciated. Thank you Endy [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Please donate to the Syria Crisis Appeal by text or online: To donate ?5 by mobile, text SYRIA to 70800. To donate online, please visit http://www.ageinternational.org.uk/syria Over one million refugees are desperately in need of water, food, healthcare, warm clothing, blankets and shelter; Age International urgently needs your support to help affected older refugees. Age International is a subsidiary charity of Age UK and a member of the Disasters Emergency Committee (DEC). The DEC launches and co-ordinates national fundraising appeals for public donations on behalf of its member agencies. Texts cost ?5 plus one standard rate message. Age International will receive a minimum of ?4.96. More info at ageinternational.org.uk/SyriaTerms ------------------------------- Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798). Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and regulated by the Financial Services Authority. ------------------------------ This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you receive a message in error, please advise the sender and delete immediately. Except where this email is sent in the usual course of our business, any opinions expressed in this email are those of the author and do not necessarily reflect the opinions of Age UK or its subsidiaries and associated companies. Age UK monitors all e-mail transmissions passing through its network and may block or modify mails which are deemed to be unsuitable. Age Concern England (charity number 261794) and Help the Aged (charity number 272786) and their trading and other associated companies merged on 1st April 2009. Together they have formed the Age UK Group, dedicated to improving the lives of people in later life. The three national Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help the Aged in these nations to form three registered charities: Age Scotland, Age NI, Age Cymru.
Dear colleagues I have a couple of problems related with binary logistic regression. The first problem is how to compute Pearson and Likelihood chi-squeared tests for grouped data. For the same form of data set how to compute sensitivity, specificity and related measures. When I speak about grouped data I mean data of the following form Alcohol.Consum Malformed NoMalformed 0 48 17066 <1 38 14464 1-2 5 788 3-5 1 126 >=6 1 37 (This data set has been taken from the book “Categorical data analysis by Agresti”) The second question is the following:is it correct to upload a grouped data set to an ungrouped one? The upload is achieved with the aid of the following routine Createdataframe <- function(d){ f1 = f2 = numeric(0) v = d[,1] for(j in 2:3){ for(i in 1:dim(d)[1]){ f1 = c( f1, rep( levels(v)[v[i]], d[i,j] ) ) } f2 = c( f2, rep( 3-j, sum(d[,j]) ) ) } df=data.frame(f1,f2) return(df) } My finally question is why SPSS computes the Hosmer-Lemeshow test while the routine listed below gives NaN. (One has to put g=8 in order to get a numeric value) The data set is (it is also has been taken from the same book mentioned above) FlightNo Temp ThermalDisast 1 66 0 2 70 1 3 69 0 4 68 0 5 67 0 6 72 0 7 73 0 8 70 0 9 57 1 10 63 1 11 70 1 12 78 0 13 67 0 14 53 1 15 67 0 16 75 0 17 70 0 18 81 0 19 76 0 20 79 0 21 75 1 22 76 0 23 58 1 and the routine used is hosmerlemeshow = function(obj, g=10) { # first, check to see if we fed in the right kind of object stopifnot(family(obj)$family=="binomial" && family(obj)$link=="logit") y = obj$model[[1]] # the double bracket (above) gets the index of items within an object if (is.factor(y)) y = as.numeric(y)==2 yhat = obj$fitted.values cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE) obs = xtabs(cbind(1 - y, y) ~ cutyhat) expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat) if (any(expect < 5)) # warning("Some expected counts are less than 5. Use smaller number of groups") chisq = sum((obs - expect)^2/expect) P = 1 - pchisq(chisq, g - 2) cat("H-L2 statistic",round(chisq,4)," and its p value",P,"\n") # by returning an object of class "htest", the function will perform like the # built-in hypothesis tests return(structure(list( method = c(paste("Hosmer and Lemeshow goodness-of-fit test with", g, "bins", sep=" ")), data.name = deparse(substitute(obj)), statistic = c(X2=chisq), parameter = c(df=g-2), p.value = P ), class='htest')) return(list(chisq=chisq,p.value=P)) } I found this routine in the internet. Thank you for your cooperation in advance. Any suggestion and/or solution will be greatly appreciated. With regards Endy [[alternative HTML version deleted]]
Please read the attach file. Thank you Endy
On Apr 19, 2013, at 11:45 AM, Endy BlackEndy wrote:> Please read the attach file.Please re-read : http://www.r-project.org/mail.html#instructions> Thank you > Endy > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Dear colleagues I have a couple of problems related with binary logistic regression. The first problem is how to compute Pearson and Likelihood chi-squeared tests for grouped data. For the same form of data set how to compute sensitivity, specificity and related measures. When I speak about grouped data I mean data of the following form Alcohol.Consum Malformed NoMalformed 0 48 17066 <1 38 14464 1-2 5 788 3-5 1 126 >=6 1 37 (This data set has been taken from the book “Categorical data analysis by Agresti”) The second question is the following:is it correct to upload a grouped data set to an ungrouped one? The upload is achieved with the aid of the following routine Createdataframe <- function(d){ f1 = f2 = numeric(0) v = d[,1] for(j in 2:3){ for(i in 1:dim(d)[1]){ f1 = c( f1, rep( levels(v)[v[i]], d[i,j] ) ) } f2 = c( f2, rep( 3-j, sum(d[,j]) ) ) } df=data.frame(f1,f2) return(df) } My finally question is why SPSS computes the Hosmer-Lemeshow test while the routine listed below gives NaN. (One has to put g=8 in order to get a numeric value) The data set is (it is also has been taken from the same book mentioned above) FlightNo Temp ThermalDisast 1 66 0 2 70 1 3 69 0 4 68 0 5 67 0 6 72 0 7 73 0 8 70 0 9 57 1 10 63 1 11 70 1 12 78 0 13 67 0 14 53 1 15 67 0 16 75 0 17 70 0 18 81 0 19 76 0 20 79 0 21 75 1 22 76 0 23 58 1 and the routine used is hosmerlemeshow = function(obj, g=10) { # first, check to see if we fed in the right kind of object stopifnot(family(obj)$family=="binomial" && family(obj)$link=="logit") y = obj$model[[1]] # the double bracket (above) gets the index of items within an object if (is.factor(y)) y = as.numeric(y)==2 yhat = obj$fitted.values cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE) obs = xtabs(cbind(1 - y, y) ~ cutyhat) expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat) if (any(expect < 5)) # warning("Some expected counts are less than 5. Use smaller number of groups") chisq = sum((obs - expect)^2/expect) P = 1 - pchisq(chisq, g - 2) cat("H-L2 statistic",round(chisq,4)," and its p value",P,"\n") # by returning an object of class "htest", the function will perform like the # built-in hypothesis tests return(structure(list( method = c(paste("Hosmer and Lemeshow goodness-of-fit test with", g, "bins", sep=" ")), data.name = deparse(substitute(obj)), statistic = c(X2=chisq), parameter = c(df=g-2), p.value = P ), class='htest')) return(list(chisq=chisq,p.value=P)) } Thank you for your cooperation in advance. Any suggestion and/or solution will be greatly appreciated. With regards Endy [[alternative HTML version deleted]]