Good evening, I'm encountering a different kind of discretization with respect to the 1997 Liu and Setiono's one descripted in their papers, using Chi2 algorithm for feature selection with discretization. As stated in R documentation (discretization - R (from CRAN) <https://cran.r-project.org/web/packages/discretization/discretization.pdf>), R package discretizion offers the function Chi2, which comes to life in the following papers: Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388?391. Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE transactions on knowledge and data engineering, Vol.9, no.4, 642?645. I wrote the following R programming language code, in which I have set alpha and delta equal to the ones set in the papers above. Finally, the following code prints out the discretized dataframe. I used Iris dataframe, as in one of the examples in the two papers. The first paper above states that alfa = 0.5 and delta = 5%, and that "the originally odd numbered data are selected for training (75 patterns) and rest for testing (75 patterns)". With this asset, Sepal attributes should be removed. library(discretization) data(iris) df1 <- iris[FALSE,]for(i in 1:nrow(iris)){ if(i %% 2 != 0){ df1 <- rbind(df1, iris[i,]) }} chi2(df1, alp=0.5, del=0.05)$Disc.data The point is that, observing the dataframe printed out by the last instruction, you can see that no attribute is removed. The discretized data frame still have 4 attributes discretized: if I correctly understood the above papers, Sepal Length and Sepal Width should have been both discretized in just one interval by Chi2 algorithm. I have posted a question here: http://stats.stackexchange.com/questions/ 247499/why-does-not-r-chi2-algorithm-discretize-in-the- same-manner-as-in-the-paper-by-l?noredirect=1#comment470974_247499. Moreover, it's really hard to understand the cut points that Chi2 algorithm implemented in R makes. For example: res <- chi2(iris, 0.5, 0.05) cut(iris$Sepal.Length, res$cutp, labels=FALSE) is different from res$Disc.data$Sepal.Length Help me understand, please Best regards [[alternative HTML version deleted]]
Notice that this relates to an R _package_, which has a maintainer. You cannot expect general R users or developers to know about the details of the package. It doesn't look like there is dcoumentation beyond the help pages, so you may need to contact the maintainer or study the actual code. -pd> On 23 Nov 2016, at 17:08 , Luke Skywalker <mattered91 at gmail.com> wrote: > > Good evening, > > I'm encountering a different kind of discretization with respect to the > 1997 Liu and Setiono's one descripted in their papers, using Chi2 algorithm > for feature selection with discretization. > > As stated in R documentation (discretization - R (from CRAN) > <https://cran.r-project.org/web/packages/discretization/discretization.pdf>), > R package discretizion offers the function Chi2, which comes to life in the > following papers: > > Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization > of numeric attributes, Tools with Artificial Intelligence, 388?391. > > Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE > transactions on knowledge and data engineering, Vol.9, no.4, 642?645. > > I wrote the following R programming language code, in which I have set > alpha and delta equal to the ones set in the papers above. Finally, the > following code prints out the discretized dataframe. I used Iris dataframe, > as in one of the examples in the two papers. The first paper above states > that alfa = 0.5 and delta = 5%, and that "the originally odd numbered data > are selected for training (75 patterns) and rest for testing (75 > patterns)". With this asset, Sepal attributes should be removed. > > library(discretization) > data(iris) > df1 <- iris[FALSE,]for(i in 1:nrow(iris)){ > if(i %% 2 != 0){ > df1 <- rbind(df1, iris[i,]) > }} > chi2(df1, alp=0.5, del=0.05)$Disc.data > > The point is that, observing the dataframe printed out by the last > instruction, you can see that no attribute is removed. The discretized data > frame still have 4 attributes discretized: if I correctly understood the > above papers, Sepal Length and Sepal Width should have been both > discretized in just one interval by Chi2 algorithm. > > I have posted a question here: http://stats.stackexchange.com/questions/ > 247499/why-does-not-r-chi2-algorithm-discretize-in-the- > same-manner-as-in-the-paper-by-l?noredirect=1#comment470974_247499. > > > Moreover, it's really hard to understand the cut points that Chi2 algorithm > implemented in R makes. For example: > > res <- chi2(iris, 0.5, 0.05) > > cut(iris$Sepal.Length, res$cutp, labels=FALSE) is different from > res$Disc.data$Sepal.Length > > Help me understand, please > > Best regards > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
What does it mean to "have a mantainer"? Is he a third party? Is he an individual developer and you can install whose package on your risk? Are the package created by maintainers not tested? Anyway, I wrote him. I'm waiting for response. Regards Il 23/Nov/2016 22:21, "peter dalgaard" <pdalgd at gmail.com> ha scritto:> Notice that this relates to an R _package_, which has a maintainer. You > cannot expect general R users or developers to know about the details of > the package. It doesn't look like there is dcoumentation beyond the help > pages, so you may need to contact the maintainer or study the actual code. > > -pd > > > On 23 Nov 2016, at 17:08 , Luke Skywalker <mattered91 at gmail.com> wrote: > > > > Good evening, > > > > I'm encountering a different kind of discretization with respect to the > > 1997 Liu and Setiono's one descripted in their papers, using Chi2 > algorithm > > for feature selection with discretization. > > > > As stated in R documentation (discretization - R (from CRAN) > > <https://cran.r-project.org/web/packages/discretization/ > discretization.pdf>), > > R package discretizion offers the function Chi2, which comes to life in > the > > following papers: > > > > Liu, H. and Setiono, R. (1995). Chi2: Feature selection and > discretization > > of numeric attributes, Tools with Artificial Intelligence, 388?391. > > > > Liu, H. and Setiono, R. (1997). Feature selection and discretization, > IEEE > > transactions on knowledge and data engineering, Vol.9, no.4, 642?645. > > > > I wrote the following R programming language code, in which I have set > > alpha and delta equal to the ones set in the papers above. Finally, the > > following code prints out the discretized dataframe. I used Iris > dataframe, > > as in one of the examples in the two papers. The first paper above states > > that alfa = 0.5 and delta = 5%, and that "the originally odd numbered > data > > are selected for training (75 patterns) and rest for testing (75 > > patterns)". With this asset, Sepal attributes should be removed. > > > > library(discretization) > > data(iris) > > df1 <- iris[FALSE,]for(i in 1:nrow(iris)){ > > if(i %% 2 != 0){ > > df1 <- rbind(df1, iris[i,]) > > }} > > chi2(df1, alp=0.5, del=0.05)$Disc.data > > > > The point is that, observing the dataframe printed out by the last > > instruction, you can see that no attribute is removed. The discretized > data > > frame still have 4 attributes discretized: if I correctly understood the > > above papers, Sepal Length and Sepal Width should have been both > > discretized in just one interval by Chi2 algorithm. > > > > I have posted a question here: http://stats.stackexchange.com/questions/ > > 247499/why-does-not-r-chi2-algorithm-discretize-in-the- > > same-manner-as-in-the-paper-by-l?noredirect=1#comment470974_247499. > > > > > > Moreover, it's really hard to understand the cut points that Chi2 > algorithm > > implemented in R makes. For example: > > > > res <- chi2(iris, 0.5, 0.05) > > > > cut(iris$Sepal.Length, res$cutp, labels=FALSE) is different from > > res$Disc.data$Sepal.Length > > > > Help me understand, please > > > > Best regards > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > > >[[alternative HTML version deleted]]