On Jul 28, 2012, at 17:37 , Giorgio Monti wrote:
> I'm a student. I'm working on a research using the statistical
program "R
> 2.15.1".
> Here's my problem: how i can do a regression considering only values
over a
> certain limit?
> For example, considering the dataset "Workinghour" of the
"Ecdat" package,
> is possible to build a predictive model that express the probability that a
> wife works more than 8 hours per day?
> The dataset includes 3382 observation on the number of hours spent working
> by wifes per year in USA.
>
> hoursday=hours/240
> index<-which(hoursday>=8)
> hoursday[index]
>
> As you see, I'm able to extract the values that in 'hoursday'
(which is
> hours/240 working days in one year) are > 8,0 but obviously i can't
do a
> regression cause the extracted data are a subset of the entire dataset (955
> observations), while the other variables, like age, occupation, income,
> etc. are still complete(3382).
>
> So i can't do:
> lm = lm(hoursday[index] ~
>
income+age+education+unemp+child5+child13+child17+nonwhite+owned+mortgage+occupation)
> In fact "R" gives me: Error in model.frame.default(formula >
hoursday[index] ~ income, drop.unused.levels = TRUE) : variable lengths
> differ (found for 'income').
>
> Can you help me?
>
Yes: don't do that. You are not going to "build a predictive model that
express the probability that a wife works more than 8 hours per day" from
data where everyone works more than 8 hours by day!
You can either fit the model to all data and work out the probabilistic
consequences, or if you don't quite believe the normality assumption of
linear models, perhaps reduce the outcome to 0/1 and turn to logit or probit
regression.
It is not technically hard to fit data to a subset, but it is a big no-no to
subset on the dependent variable. Well, you can, and people do, actually do
subsampling on the response variable, but the standard methods of analysis do
not apply.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com