Dear All,
I have some questions about probit regressions.
I saw a nice introduction at
http://bit.ly/bU9xL5
and I mainly have two questions.
(1) The first is almost about data manipulation. Consider the following
snippet
##################################################
mydata <-
read.csv(url("http://www.ats.ucla.edu/stat/r/dae/binary.csv"))
names(mydata) <-
c("outcome","x1","x2","x3")
myprobit <- glm(mydata$outcome~mydata$x1+mydata$x2+as.factor(mydata$x3),
family=binomial(link="probit"))
print(summary(myprobit))
#Now assume I can make a regression only on x1
myprobit2 <- glm(mydata$outcome~mydata$x1,
family=binomial(link="probit"))
print(summary(myprobit2))
#express in terms of counts
md <- t(table(mydata$outcome, mydata$x1))
# create new dataframe
mydatanew <- data.frame(as.numeric(row.names(md)))
names(mydatanew) <- c("x1")
mydatanew$successes <-as.numeric(md[ ,2])
mydatanew$failures <-as.numeric(md[ ,1])
########################################################################
where first I carry out a logit regression of the binary outcome (i.e.
taking only 0/1 as values) on 3 regressors, then I simply regress the
outcome on the x1 variable.
Finally, I generate the data frame mydatanew (see some of its entries below)
> mydatanew
x1 successes failures
1 220 0 1
2 300 1 2
3 340 1 3
4 360 0 4
5 380 0 8
[...................]
where for every value of x1 I count the number of 0 and 1 outcomes
(namely number of failures and number of successes). This is equivalent
to having a full list of x1 values with an associated 0/1 outcome (I
have simply counted them) hence it is all the info I need to again
perform a logit regression of the binary outcome on x1, but the data
format is now different. How can I actually feed R with mydatanew to
perform again a logistic regression on x1 only?
(2) This is a bit more conceptual. Let us say that you have a set of
products A,B,C,D,E,F. Each product has a list of features: x_A for
product A, x_B for B etc...
Each customer has its own set of parameters (age, sex, income etc..) I
call x_cust. Finally, the customer is confronted with two products (e.g.
A and D; combinations may vary, I call each combination of two products
a scenario) and asked which one he would like to buy. Bottom line: your
data are in the format
1 x_A x_cust
0 x_D x_cust
meaning that a certain customer chose product A against product D; similarly
1 x_C x_cust
0 x_B x_cust
would mean that the customer choosing between C and B finally selected
C. Every customer needs to choose a product in a variety of different
scenarios. How would you analyze this kind of data? Is there any way I
can express, in my probit analysis, the fact that my binary outcome (but
this product or not) arises always from the comparison of two products
only (customers are never given a choice between more than two products
in a given scenario). Or should I simply run my logistic regression on
my 0/1 outcome without any extra worry (like in the snippet above)?
Many thanks
Lorenzo