thr3ads.net - R help - [R] Bias in sample - Logistic Regression [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Maithili Shiva

2008-Oct-01 05:27 UTC

[R] Bias in sample - Logistic Regression

Hi. This is my first mail. I am new to R and also to the company where I am
employed now. I have Statistics background (i.e. Master of Science).

However, it is only now I understand what I learned in text is simply a bookish
knowledge and when it comes to actually applying statistics in real life, things
are not simple. But I am sure I will accordingly upgrade myself.

In my new company, I had been assigned a task of developing credit scoring
model. Using logistic regression and using R Language, I have estimated the
regression coefficients etc and arrived at the probability of default using the
significant variables (or attributes).

However, as a part of validating the model, I have been asked to find out if
there is any bias in the sample data (I have used for logistic regression) and
to use REJECT INFERENCE.

For me this is really new and I am really helpless at this moment as also my
confirmation will take place based on this model developed. I sincerely
appreciate if someone guide me as to how do I proceed and use this concept of
"Reject inference". I did surf the net, but whatever information I
could gather, was not sufficient. Also please suggest if there is any other
measure in R, I can use to find out if my sample is bias free.

I wish to bring to your kind notice that, I had constructed the sample (for
customer data) using the various variables like Sex, Gross Income, no of
dependents etc. and while construding this data, I had used the random numbers.

I thank you in advance and sincerely apologise for this long mail. Please
someone help me out.

Maithili

Wensui Liu

2008-Oct-01 23:19 UTC

head link

[R] Bias in sample - Logistic Regression

Hi, Shiva,

The idea of reject inference is very simple. Let's assume a credit card
environment. There are 100 applicants, out of which 50 will be approved and
booked in. Therefore, we can only observe the adverse behavior, such as
default and delinquency, of 50 booked accounts. Again, let's assume out of
50 booked cards, 5 are bad(default / delinquency). A normal thought is to
build a model to "cherry pick" bad guys and then apply the same model
to all
applicants.

However, we can only observed the behavior of the applicants booked, which
is 50, but not all applicants, which is 100. Therefore, the model result
looks better than what it is supposed to be. This is so-called 'sample
bias'. The same thing can happen to healthcare or direct marketing as well.

Luckily enough, many people have done some excellent work on this problem.
Please do some readings by Heckman. Greene in NYU has paper in this area as
well. And I believe there is also implementation in R. If you use SAS(large
in industry), take a look at proc qlim.

HTH.

-- 
==============================WenSui Liu
Acquisition Risk, Chase
Email : wensui.x.liu@chase.com
Blog   : statcompute.spaces.live.com
==============================
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more seemingly similar threads

R help - Oct 2008 - Bias in sample - Logistic Regression

[R] Bias in sample - Logistic Regression

[R] Bias in sample - Logistic Regression

Reasonably Related Threads