Dear list: After reading the following two links: http://luna.cas.usf.edu/~mbrannic/files/regression/Logistic.html http://www.tufts.edu/~gdallal/logistic.htm I've known the mathematical basis for logistic regression.However I am still not so sure about the "logit " For a categorical independent variable, It is easy to understand the procedures how "log odds" are calculated. As I know, First the observations are grouped according to the IV and DV, generating a contingency table.The columns are the levels of IV, and the rows are the levels of DV(0, or 1).For each column,we get the proprotions for DV=0 and DV=1 at given IV. Using the proportions the log odds can be computed.Is that right? My problem is this : in my data set , the IVs are continuous variables, do I still have to generate such a table and compute the log odds for each level of IV according to which the log odds are calculated? In R , fitted(fit) gives the fitted probability for DV to be 1. Dose the observed probability exist ? If it does exist , how can I extract it ? If the IV is cartegorical , the DV can readily changed to be a tow-culumned matrix, thus log(the observed probabily/(1-the observed probability) might be the "log odds". I wonder what if the IV is continuous ? And about the residuals. It seems that the residual is not the actual DV minus the fitted probability. For in my model extreme residuals lie well beyond (0,1). I wonder how the residual is computed. Would you please help me ? Thank all very much again. Regards, Bin Yue ----- Best regards, Bin Yue ************* student for a Master program in South Botanical Garden , CAS -- View this message in context: http://www.nabble.com/the-observed-%22log-odds%22-in-logistic-regression-tp14267125p14267125.html Sent from the R help mailing list archive at Nabble.com.
Bin Yue <leffgh <at> 163.com> writes:> After reading the following two links: > http://luna.cas.usf.edu/~mbrannic/files/regression/Logistic.html > http://www.tufts.edu/~gdallal/logistic.htm > I've known the mathematical basis for logistic regression.However I am > still not so sure about the "logit " > For a categorical independent variable, It is easy to understand the > procedures how "log odds" are calculated. As I know, First the observations > are grouped according to the IV and DV, generating a contingency table...> My problem is this : in my data set , the IVs are continuous variables, > do I still have to generate such a table and compute the log odds for each > level of IV according to which the log odds are calculated?Let's assume you are going to use glm in package stats. glm can be fed with data in three ways; in your case, you should use the "one-row/one 0-1 event" format, that is the "long" style. You do not have to compute any logit, glm will do that for your. The example coming closest to your's is the birthwt example in MASS/scripts/ch07.R and chapter 7 in Venables/Ripley MASS. Try to generate a small, self-running example with a data set similar to your's, and you have a good chance to get a more detailed answer. Dieter
Bernardo Rangel Tura
2007-Dec-11 07:49 UTC
[R] the observed "log odds" in logistic regression
On Mon, 2007-12-10 at 19:42 -0800, Bin Yue wrote: (...)> My problem is this : in my data set , the IVs are continuous variables, > do I still have to generate such a table and compute the log odds for each > level of IV according to which the log odds are calculated?If IV is a continuous variable isn't possible you create a contingency table because don't exist levels. Similar is not possible calculate de log odds of P(IV=x) but is possible calculate log odds of P(IV<x) or log odds of P(IV=x+delta) with delta tend to zero. In this case is common create a cut-off for IV and fit log odds of P(IV>x)> In R , fitted(fit) gives the fitted probability for DV to be 1. Dose the > observed probability exist ? If it does exist , how can I extract it ? If > the IV is cartegorical , the DV can readily changed to be a tow-culumned > matrix, thus log(the observed probabily/(1-the observed probability) might > be the "log odds". I wonder what if the IV is continuous ? > And about the residuals. It seems that the residual is not the actual > DV minus the fitted probability. For in my model extreme residuals lie well > beyond (0,1). I wonder how the residual is computed. > Would you please help me ? Thank all very much again.So to help you send a small part of your data and a reproductive example to us because is more easy understand your question this way -- Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil