thr3ads.net - similar to: "Sorting text docs based on document meta values in tm()"

Displaying 14 results from an estimated 14 matches similar to: "Sorting text docs based on document meta values in tm()"

R hangs at NGramTokenizer

2013 Sep 26

R hangs at NGramTokenizer

Hi: I try to construct a Document-Term Meatrix from a corpus. The commands I used are: > library(parallel)> library(tm)> library(RWeka)> library(topicmodels)> library(RTextTools)> cl=makeCluster(detectCores())> invisible(clusterEvalQ(cl, library(tm)))> invisible(clusterEvalQ(cl, library(RWeka))) > invisible(clusterEvalQ(cl, library(topicmodels)))>

tm_map help

2012 Feb 26

tm_map help

Hi all, I am trying to do some text mining with twitter and I am getting the error: Error in structure(names(sapply(possibleCompletions, "[", 1)), names = x) : 'names' attribute [1] must be the same length as the vector [0] When I use tm_map. Has anyone had/seen this error before? The code I have is shown below and this error only occurs with #qantas, hashtags like #asx,

merging corpora and metadata

2011 Nov 17

merging corpora and metadata

Greetings! I loose all my metadata after concatenating corpora. This is an example of what happens: > meta(corpus.1) MetaID cid fid selfirst selend fname 1 0 1 11 2169 2518 WCPD-2001-01-29-Pg217.scrb 2 0 1 14 9189 9702 WCPD-2003-01-13-Pg39.scrb 3 0 1 14 2109 2577 WCPD-2003-01-13-Pg39.scrb .... .... 17 0

convert list to Dataframe

2009 Nov 01

convert list to Dataframe

Hi. I have a huge list called twitter: > dim(twitter) NULL > str(twitter) List of 1 $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For Governance From Campaigner-in-chief: President obama jumps campaign 09 tuesday..

Library (tm) Error: could not find function "TermDocMatrix".

2010 Apr 23

Library (tm) Error: could not find function "TermDocMatrix".

Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl =

Stemming functions only work on the last word of plain text documents

2011 Sep 05

Stemming functions only work on the last word of plain text documents

Hello, I want to use the SnowballStemmer on a collection of plain text documents. However, when I apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example: > path <- c("c:\path\to\directory") # collection of plain text documents > corp <-

rake error

2009 Jul 20

rake error

When I run rake test:units I get this error: 292 tests, 350 assertions, 2 failures, 13 errors rake aborted! Command failed with status (1): [/usr/local/bin/ruby -I"lib:test" "/ usr/loc...] This error just showed up yesterday --- I have no idea how I caused it. Here is my gem list in case that helps: actionmailer (2.3.2, 2.2.2) actionpack (2.3.2, 2.2.2) activerecord (2.3.2, 2.2.2)

Invalid input error in tm package

2010 Jan 22

Invalid input error in tm package

Hello, I am working on "tm" package. I have 2 pdf files saved in the directory D:/Files I issued the following commands (marked in red bold) for which I got some errors and warnings (marked in bold) *surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language = "ansi"))* *Warning messages: 1: In readLines(y, encoding = x$Encoding) : incomplete final

cannot find package in Packages>>Install Packages

2012 Jan 08

cannot find package in Packages>>Install Packages

Hi. I am trying to install a package called DMwR http://cran.r-project.org/web/packages/DMwR/index.html located here: http://cran.r-project.org/bin/windows/contrib/r-release/DMwR_0.2.1.zip on windows 7. I am using R 2.10.1. I also tried typing something like this but it did not work well. install.packages(c(" http://cran.r-project.org/bin/windows/contrib/r-release/DMwR_0.2.1.zip

Efficiently Extracting Meta Data from TM Corpora

2009 Aug 13

Efficiently Extracting Meta Data from TM Corpora

I'm using text miner (the "tm" package) to process large numbers of blog and message board postings (about 245,000). Does anyone have any advice for how to efficiently extract the meta data from a corpus of this size? TM does a great job of using MPI for many functions (e.g. tmMap) which greatly speed up the processing. However, the "meta" function that I need does not

+ camping/session

2006 Feb 21

+ camping/session

Camping now comes with a sessioning class, checked in tonight. To get sessions working for your application: 1. require ''camping/session'' 2. include Camping::Session in your application''s toplevel module. 3. In your application''s create method, add a call to Camping::Models::Schema.create_schema 4. Throughout your application, use the @state

putting away HashWithIndifferentAccess

2007 Sep 25

putting away HashWithIndifferentAccess

Hey, campineros. And many good handshakes to zimbatm for getting some patches applied. So, yeah, I''d really like to get rid of any serious dependancies with this 1.6 release. Anything that''s not in stdlib has to go. Of course, camping-omnibus will still assume the whole ActiveRecord, Markaby, Mongrel setup that''s in the history books. Metaid can be removed and

Packages build for Solaris ? As CSW packages ?

2006 Dec 01

Packages build for Solaris ? As CSW packages ?

Well imitation is the highest form of flattery they say. So I''m surprised to see these packages neatly built to install into /opt/csw correctly and yet they exist somewhere else and have nothing to do with us here at Blastwave. fascinating. I guess we can always send an email to the person doing this and just ask if they want those packages in testing and then into the catalog for

Issue with puppet file serving api not parsing yaml content correctly

2011 Jul 06

Issue with puppet file serving api not parsing yaml content correctly

I am working on building a facter tag based node classifier similar to https://github.com/jordansissel/puppet-examples/tree/master/nodeless-puppet/. However, I have run into an issue where I cannot use puppet''s require file ability to push the yaml file containing the facts file to the client because it would require two runs of puppet to pickup changes. Consequently, I have written into

similar to: Sorting text docs based on document meta values in tm()