Hi. This is my first mail. I am new to R and also to the company where I am employed now. I have Statistics background (i.e. Master of Science). However, it is only now I understand what I learned in text is simply a bookish knowledge and when it comes to actually applying statistics in real life, things are not simple. But I am sure I will accordingly upgrade myself. In my new company, I had been assigned a task of developing credit scoring model. Using logistic regression and using R Language, I have estimated the regression coefficients etc and arrived at the probability of default using the significant variables (or attributes). However, as a part of validating the model, I have been asked to find out if there is any bias in the sample data (I have used for logistic regression) and to use REJECT INFERENCE. For me this is really new and I am really helpless at this moment as also my confirmation will take place based on this model developed. I sincerely appreciate if someone guide me as to how do I proceed and use this concept of "Reject inference". I did surf the net, but whatever information I could gather, was not sufficient. Also please suggest if there is any other measure in R, I can use to find out if my sample is bias free. I wish to bring to your kind notice that, I had constructed the sample (for customer data) using the various variables like Sex, Gross Income, no of dependents etc. and while construding this data, I had used the random numbers. I thank you in advance and sincerely apologise for this long mail. Please someone help me out. Maithili
Hi, Shiva, The idea of reject inference is very simple. Let's assume a credit card environment. There are 100 applicants, out of which 50 will be approved and booked in. Therefore, we can only observe the adverse behavior, such as default and delinquency, of 50 booked accounts. Again, let's assume out of 50 booked cards, 5 are bad(default / delinquency). A normal thought is to build a model to "cherry pick" bad guys and then apply the same model to all applicants. However, we can only observed the behavior of the applicants booked, which is 50, but not all applicants, which is 100. Therefore, the model result looks better than what it is supposed to be. This is so-called 'sample bias'. The same thing can happen to healthcare or direct marketing as well. Luckily enough, many people have done some excellent work on this problem. Please do some readings by Heckman. Greene in NYU has paper in this area as well. And I believe there is also implementation in R. If you use SAS(large in industry), take a look at proc qlim. HTH. -- ==============================WenSui Liu Acquisition Risk, Chase Email : wensui.x.liu@chase.com Blog : statcompute.spaces.live.com ============================== [[alternative HTML version deleted]]