Will Ebert
2017-Mar-14 17:46 UTC
[R] Document Term Matrix will not maintain decimal places of numbers or capture all terms
Before I updated my version of RStudio (1.0.136), everything worked great. With the update something has changed with Document Term Matrix in the 'tm' package. I want to create a dtm, but with numbers. For instance if I have a .csv with one column as shown below: x1.0111.21123.35212.11 I want the column names in my term matrix to look like this: 1.01 11.21 123.35 212.111 0 0 00 1 0 00 0 1 00 0 0 1 But instead it looks like this: 123 2120 00 01 00 1 Here's the code that used to work: corpus = Corpus(VectorSource(x)) dtm = DocumentTermMatrix(corpus) dtm_df = as.data.frame(as.matrix(dtm)) I have tried uninstalling everything and reinstalling, tried older versions (Studio 0.99.489 & R 3.3.1), but I get the same results. I ask others to test it out and it works for them. Also, I had someone download R, Rtools, and RStudio to test this and they got the same results I did. I have no idea what has happened and would greatly appreciate help on this matter as it is extremely urgent. Thanks in advance Will [[alternative HTML version deleted]]
Sarah Goslee
2017-Mar-15 17:12 UTC
[R] Document Term Matrix will not maintain decimal places of numbers or capture all terms
This question is a strong argument for not posting in HTML. I at least cannot make sense out of the example. It's also a strong argument for providing a small reproducible example using dput() to provide data. Someone is more likely to be able to help if we don't have to guess what you meant. Sarah On Tue, Mar 14, 2017 at 1:46 PM, Will Ebert <willebert34 at gmail.com> wrote:> Before I updated my version of RStudio (1.0.136), everything worked great. > With the update something has changed with Document Term Matrix in the 'tm' > package. I want to create a dtm, but with numbers. For instance if I have a > .csv with one column as shown below: > > x1.0111.21123.35212.11 > > I want the column names in my term matrix to look like this: > > 1.01 11.21 123.35 212.111 0 0 00 1 0 00 0 > 1 00 0 0 1 > > But instead it looks like this: > > 123 2120 00 01 00 1 > > Here's the code that used to work: > > corpus = Corpus(VectorSource(x)) > dtm = DocumentTermMatrix(corpus) > dtm_df = as.data.frame(as.matrix(dtm)) > > I have tried uninstalling everything and reinstalling, tried older versions > (Studio 0.99.489 & R 3.3.1), but I get the same results. I ask others to > test it out and it works for them. Also, I had someone download R, Rtools, > and RStudio to test this and they got the same results I did. I have no > idea what has happened and would greatly appreciate help on this matter as > it is extremely urgent. > > Thanks in advance > > Will > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Sarah Goslee http://www.functionaldiversity.org