I need to calculate information gain using Fselector package for feature
selection ti classify document
i executed the code below
library(tm)
library(NLP)
library(FSelector)
doc<-c( "The sky is blue.", "The sun is bright today.",
"The sun in the sky is bright.","We can see the shining
sun, the
bright sun.")
doc_corpus <- Corpus( VectorSource( doc ) )
control_list <- list( removePunctuation = TRUE, stopwords = TRUE, tolower
TRUE )
tdm <- TermDocumentMatrix( doc_corpus, control = control_list )
( tf <- as.matrix(tdm ) )
tf
tf1<-t(tf)
tfdataframe<-data.frame(tf1)
tfdataframe
tfdataframe$doc<-c("1","2","3","4")
tfdataframe
#information gain based on term frequency
infgain <- information.gain(doc~.,tfdataframe )
infgain
and i got output
> infgain
attr_importance
blue 0.0000000
bright 0.0000000
can 0.0000000
see 0.0000000
shining 0.0000000
sky 0.6931472
sun 0.0000000
today 0.0000000>
is this output is logically correct??
I am totally confused!!!!!!!!!
could anyone help me please
Thanks in advance
[[alternative HTML version deleted]]