I wish to calculate the weight of evidence of a variable x, which is positively skewed, with over 6000 of the observations are 999 but only 200 range from 1-27. I used the code, ?IV<-create_infotables(data=Test[,-1],y="class",bins=10)? However, no matter what number I used in bins parameter, I can only get 2 bins, [1,27] and [999,999]. Is there any way I can look into the [1,27] closely because they represent a lot? The output from R is shown below, Table$pdays pdays N Percent WOE IV 1 [1,27] 243 0.03807584 2.6743166 0.5267751 2 [999,999] 6139 0.96192416 -0.2230081 0.5707022 Thank you very much!! [[alternative HTML version deleted]]
Seems rather likely that 999 is not really a measured value but rather is a missing value indicator. -- David. On 3/10/19 1:54 PM, wong bowie wrote:> I wish to calculate the weight of evidence of a variable x, which is > positively skewed, with over 6000 of the observations are 999 but only 200 > range from 1-27. I used the code, > > ?IV<-create_infotables(data=Test[,-1],y="class",bins=10)? > > However, no matter what number I used in bins parameter, I can only get 2 > bins, [1,27] and [999,999]. Is there any way I can look into the [1,27] > closely because they represent a lot? The output from R is shown below, > > Table$pdays > pdays N Percent WOE IV > 1 [1,27] 243 0.03807584 2.6743166 0.5267751 > 2 [999,999] 6139 0.96192416 -0.2230081 0.5707022 > > Thank you very much!! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You are right. Actually this variable represents the number of day passed after contacting a client, 999 means the client has never been contacted. But I am not supposed to change the value, am I? David Winsemius <dwinsemius at comcast.net> ? 2019?3?10? ?? ??10:48???> Seems rather likely that 999 is not really a measured value but rather > is a missing value indicator. > > > -- > > David. > > On 3/10/19 1:54 PM, wong bowie wrote: > > I wish to calculate the weight of evidence of a variable x, which is > > positively skewed, with over 6000 of the observations are 999 but only > 200 > > range from 1-27. I used the code, > > > > ?IV<-create_infotables(data=Test[,-1],y="class",bins=10)? > > > > However, no matter what number I used in bins parameter, I can only get 2 > > bins, [1,27] and [999,999]. Is there any way I can look into the [1,27] > > closely because they represent a lot? The output from R is shown below, > > > > Table$pdays > > pdays N Percent WOE IV > > 1 [1,27] 243 0.03807584 2.6743166 0.5267751 > > 2 [999,999] 6139 0.96192416 -0.2230081 0.5707022 > > > > Thank you very much!! > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Bowie, As David suggested, you can substitute the R missing value (NA) for 999 (probably an SPSS missing value). If you don't want to change it, you could probably just subset your data like this: V<-create_infotables(data=Test[Test[n] != 999,-1],y="class",bins=10) where "n" is the column number in Test of the variable of interest. Jim On Mon, Mar 11, 2019 at 9:45 AM wong bowie <bowiewongg at gmail.com> wrote:> > I wish to calculate the weight of evidence of a variable x, which is > positively skewed, with over 6000 of the observations are 999 but only 200 > range from 1-27. I used the code, > > ?IV<-create_infotables(data=Test[,-1],y="class",bins=10)? > > However, no matter what number I used in bins parameter, I can only get 2 > bins, [1,27] and [999,999]. Is there any way I can look into the [1,27] > closely because they represent a lot? The output from R is shown below, > > Table$pdays > pdays N Percent WOE IV > 1 [1,27] 243 0.03807584 2.6743166 0.5267751 > 2 [999,999] 6139 0.96192416 -0.2230081 0.5707022 > > Thank you very much!! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You are asking the wrong question. The right question is, "why are so many values missing?" Is it because they were censored, not reported for some reason, due to instrument failure,...? Until you answer that question, any analysis you do is garbage. I strongly recommend you consult a competent data analyst. Bert On Sun, Mar 10, 2019, 9:21 PM Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Bowie, > As David suggested, you can substitute the R missing value (NA) for > 999 (probably an SPSS missing value). If you don't want to change it, > you could probably just subset your data like this: > > V<-create_infotables(data=Test[Test[n] != 999,-1],y="class",bins=10) > > where "n" is the column number in Test of the variable of interest. > > Jim > > On Mon, Mar 11, 2019 at 9:45 AM wong bowie <bowiewongg at gmail.com> wrote: > > > > I wish to calculate the weight of evidence of a variable x, which is > > positively skewed, with over 6000 of the observations are 999 but only > 200 > > range from 1-27. I used the code, > > > > ?IV<-create_infotables(data=Test[,-1],y="class",bins=10)? > > > > However, no matter what number I used in bins parameter, I can only get 2 > > bins, [1,27] and [999,999]. Is there any way I can look into the [1,27] > > closely because they represent a lot? The output from R is shown below, > > > > Table$pdays > > pdays N Percent WOE IV > > 1 [1,27] 243 0.03807584 2.6743166 0.5267751 > > 2 [999,999] 6139 0.96192416 -0.2230081 0.5707022 > > > > Thank you very much!! > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]