thr3ads.net - similar to: "recursively count the words occurrence in the text files"

Displaying 9 results from an estimated 9 matches similar to: "recursively count the words occurrence in the text files"

how to cluster rows of words in a text file

2012 Mar 23

how to cluster rows of words in a text file

Hi: I am trying to cluster the rows of a text file with kmeans: I load the data as follows file1 <- read.csv("somefile.csv") and the file can be viewed having the following line of words > file1 1 word1 word3 word4 word1 2 word1 word4 word3 word1 3 word4 word2 word4 word3 4 word4 word2 word1 word3 5 word2 word2 word4 word2 file_as_matrix <- as.matrix(file1); Now,

sorting during xtabs? sorting by "individual" order?

2005 Nov 08

sorting during xtabs? sorting by "individual" order?

Hey alltogether, refacturing a package (before it will be released), I ran across the following problem. I have two directories with different text files, I want to read the first and construct a document-term matrix from it (every term=word in a row, every file in a column, occurrence frequencies form the values). The second directory contains different files. It needs to be read in to also

re ading tokens

2009 Nov 03

re ading tokens

Greetings, I am not familiar with processing text in R. Can someone tell me how to read each line of words as separate elements in a list? FE, I would like to turn: word1 word2 word3 word2 word4 into a list of length two with three character elements in the first list and two elements in the second. I know that this should be easy, but I am a little confused by the text functions. Thanks in

findAssocs()

2011 Sep 26

findAssocs()

I am trying to find the math behind the "tm" package findAssocs() ?findAssocs does not say anything besides "association" and "correlate" Usually entering "findAssocs" at the CLI gives the code for a R function, but in this case I obtain: function (x, term, corlimit) UseMethod("findAssocs", x) <environment: namespace:tm> Any ideas?

upgrade 2.2.8a -> 3.0 Debian DOS long filename problem

2003 Nov 03

upgrade 2.2.8a -> 3.0 Debian DOS long filename problem

Hi, I tried to upgrade from Samba 2.2.8a to 3.0. It worked generally speaking fine and speed went up tremendously, BUT since then the DOS conversion of long file names is freaking me out. Using long file names produced a readable short version plus ~1 oder ~x. where x stands for a number. now long file names produce some cryptic 8 letter name plus extension. listing in a DOS box

How does findAssocs() calculate the correlation value ??

2017 Jul 07

How does findAssocs() calculate the correlation value ??

hi: I want to know the math behind the "tm" package findAssocs(). I have found that someone had asked the question before, and have a good explanation by Rick. ?]http://r.789695.n4.nabble.com/findAssocs-td3845751.html?^ But I still don't understand how to calculate the correlation value between the two vectors. For example: # Correlation word2 with word3

Extending/Modifying QueryParser

2007 Jul 07

Extending/Modifying QueryParser

Hi, I''ve implemented synonym searching in my rails application but have an idea I''d like to implement but can''t figure out how to do. The idea is that I''d like to give the end user the choice on whether to search for the synonym of a word or not. Preferably by extending the query language to parse a construct similar to ''%word1'' and

MS Word from samba share

1999 Jul 20

MS Word from samba share

A few days ago I send a message titled "A samba question or a mutt question". I got a few repies but they suggested wide work-arounds to the problem. I have done some more work and it is clear it is not anything to do with the mail side of what I was trying. The question or problem I have is essentially this:- How do you double click on a MS Word icon to bring the document into MS Word

How to add a column to dtm showing a part from directory source?

2010 Apr 04

How to add a column to dtm showing a part from directory source?

Hello Experts, I'm new with R and having troubles doing my graduation project.I have 20 subfolders including almost 20000 txt files.What i need to do is to create a dtm and add a column to it showing a "class" information of the txt files. My directory source is like "C:\\R\\20news-18828\\comp.graphics" for the comp.graphic subfolder.I need to take only

similar to: recursively count the words occurrence in the text files