Hi I am trying to make glm() work to analyze a toy logit system. I have a dataframe with x and y independent variables. I have L=1+x-y (ie coefficients 1,1,-1) then if I have a logit relation with L=log(p/(1-p)), p=1/(1+exp(L)). If I interpret "p" as the probability of success in a Bernouilli trial, and I can observe the result (0 for "no", 1 for "yes") how do I retrieve the coefficients c(1,1,-1) from the data? n <- 300 des <- data.frame(x=(1:n)/n,y=sample(n)/n) # experimental design des <- cbind(des,L=1+des$x-des$y) # L=1+x-y des <- cbind(des,p=1/(1+exp(des$L))) # p=1/(1+e^L) des <- cbind(des,obs=rbinom(n,1,des$p)) # observation: prob of success = p. My attempt is: glm(obs~x+y,data=des,family=binomial(link="logit")) But it does not retrieve the correct coefficients of c(1,1,-1) ; I would expect a reasonably close answer with so much data. What is the correct glm() call to perform my logit analysis? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743
On Fri, 15 Jul 2005, Robin Hankin wrote:> I am trying to make glm() work to analyze a toy logit system. > > I have a dataframe with x and y independent variables. I have > > L=1+x-y (ie coefficients 1,1,-1) > > then if I have a logit relation with L=log(p/(1-p)), > p=1/(1+exp(L)).Not quite, see below.> If I interpret "p" as the probability of success in a Bernouilli > trial, and I can observe the result (0 for "no", 1 for "yes") > how do I retrieve the coefficients c(1,1,-1) > from the data? > > n <- 300 > des <- data.frame(x=(1:n)/n,y=sample(n)/n) # experimental design > des <- cbind(des,L=1+des$x-des$y) # L=1+x-y > des <- cbind(des,p=1/(1+exp(des$L))) # p=1/(1+e^L)A logit would be p = e^L/(1+e^L), so your signs for L are reversed.> des <- cbind(des,obs=rbinom(n,1,des$p)) # observation: prob of > success = p. > > > My attempt is: > > glm(obs~x+y,data=des,family=binomial(link="logit")) > > But it does not retrieve the correct coefficients of c(1,1,-1) ; > I would expect a reasonably close answer with so much data.You actually have so little data.> What is the correct glm() call to perform my logit analysis?The call is correct, the expectation is not. A single bernoulli observation provides far less information than you seem to suppose. I got Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.4747 0.3670 -4.019 5.85e-05 *** x -0.5549 0.4672 -1.188 0.23494 y 1.2963 0.4731 2.740 0.00614 ** and note how large the standard errors are. With 10000 examples you will get closer. Having fixed your sign change, I got Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.98711 0.06024 16.39 <2e-16 *** x 1.00896 0.08052 12.53 <2e-16 *** y -0.87798 0.08031 -10.93 <2e-16 *** -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Robin Hankin <r.hankin at noc.soton.ac.uk> writes:> Hi > > I am trying to make glm() work to analyze a toy logit system. > > I have a dataframe with x and y independent variables. I have > > L=1+x-y (ie coefficients 1,1,-1) > > then if I have a logit relation with L=log(p/(1-p)), > p=1/(1+exp(L)). > > If I interpret "p" as the probability of success in a Bernouilli > trial, and I can observe the result (0 for "no", 1 for "yes") > how do I retrieve the coefficients c(1,1,-1) > from the data? > > n <- 300 > des <- data.frame(x=(1:n)/n,y=sample(n)/n) # experimental design > des <- cbind(des,L=1+des$x-des$y) # L=1+x-y > des <- cbind(des,p=1/(1+exp(des$L))) # p=1/(1+e^L) > des <- cbind(des,obs=rbinom(n,1,des$p)) # observation: prob of > success = p. > > > My attempt is: > > glm(obs~x+y,data=des,family=binomial(link="logit")) > > But it does not retrieve the correct coefficients of c(1,1,-1) ; > I would expect a reasonably close answer with so much data. > > What is the correct glm() call to perform my logit analysis?Apart from a sign error, the only fault seems to be that you think that 300 is a large number... Try upping n to 30000 and you'll see. (The sign error is of course that log(p/(1-p)) = L implies that p = exp(L)/(1+exp(L)) = 1/(1+exp(-L))). -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Seemingly Similar Threads
- How to find the accuracy of the predicted glm model with family = binomial (link = logit)
- 'mgcv' package, problem with predicting binomial (logit) data
- How does glm(family='binomial') deal with perfect sucess?
- what does the it when there is a zero events in the Logistic Regression with glm?
- about family=binomial in glm funtion