thr3ads.net - search: "plaintextdocument"

Displaying 14 results from an estimated 14 matches for "plaintextdocument".

2009 Nov 01

convert list to Dataframe

Hi. I have a huge list called twitter: > dim(twitter) NULL > str(twitter) List of 1 $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For Governance From Campaigner-in-chief: President obama jumps campaign 09 tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535 12210;10:47:37;20;10...

Loop sobre muchos data frames

2015 Apr 10

Loop sobre muchos data frames

...quot; si extraigo solamente el primer objeto de esa lista: > txt[[1]] <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> si quiero ver el contenido del primer copus > inspect(txt[[1]]) <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> [[1]] <<PlainTextDocument (metadata: 7)>> qB001.txt me informa cosas sobre el objeto, pero los datos no están allí... debería mostrarme algo así como: inspect(cbD02[1:1]) #inspecciono el corpus cbD120, creado a mano por la sentencia cbD120<-Corpus(VectorSource(qT120)) #......contenido del corpus...... I went to...

tm package, custom reader

2012 Jan 13

tm package, custom reader

I need help with creating custom xml reader for use with the tm package. The objective is to crate a corpus for analysis. Files that I'm working with come from solr and are in a funky XML format never the less I'm able to parse the XML files using solrDocs.R function provided by Duncan Temple Lang. The problem I'm having that once I parse the document I need to create a custom

Loop sobre muchos data frames

2015 Apr 12

Loop sobre muchos data frames

...> <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> >> >> si quiero ver el contenido del primer copus >> >> > inspect(txt[[1]]) >> <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> >> >> [[1]] >> <<PlainTextDocument (metadata: 7)>> >> qB001.txt >> >> me informa cosas sobre el objeto, pero los datos no están allí... debería >> mostrarme algo así como: >> >> inspect(cbD02[1:1]) #inspecciono el corpus cbD120, creado a mano por la >> sentencia cbD120<-Corpus(Vecto...

Curious treatment of entities in xmlTreeParse

2011 Apr 06

Curious treatment of entities in xmlTreeParse

...quot;), DateTimeStamp = list("function",function(x) as.POSIXlt(Sys.time(), tz = "GMT")), Heading = list("node", "/item/title"), ID = list("function", function(x) tempfile()), Origin = list("node", "/item/link")), doc = PlainTextDocument()) rss2Source <- function(x, encoding = "UTF-8") XMLSource(x, function(tree) XML::getNodeSet(XML::xmlRoot(tree),"/rss/channel/item"), rss2Reader, encoding) feed.rss2 <- rss2Source(url("http://scottbw.wordpress.com/feed/")) corp1<-Corpus(feed.rss2, readerCo...

Function failure in tm

2013 Jan 15

Function failure in tm

...kage tm (that Milan Bouchet-Vallat has been instrumental in producing). I can get it to produce a corpus of class: "VCorpus" "Corpus" "list" class(mycorp[1]) returns "VCorpus" "Corpus" "list" and class(mycorp[[1]] returns "PlainTextDocument" "TextDocument" "character" But now that I've got a corpsu, none of the transformation functions work at all. They all return the following error (with the respective function name) Error in UseMethod("stripWhitespace", x) : no applicable method fo...

cannot find package in Packages>>Install Packages

2012 Jan 08

cannot find package in Packages>>Install Packages

...(twitter) >>> >> > This looks to have been converted into an R object through soe process on > some unspecified input. You should describe that process, and hte only > unambiguous method of doing so is by including the code. > > > List of 1 >> $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic >> [1:35575] 11999;10:47:14;20;10;2009;**ObamaLouverture;Trails Mixed >> Lessons For >> Governance From Campaigner-in-chief: President obama jumps campaign 09 >> tuesday.. http://bit.ly/2eHMaN;Florida;**USA;FL;;;27.6...

wordcloud y tabla de palabras

2014 Jul 28

wordcloud y tabla de palabras

Hola, La referencia (gracias por proporcionarla) que has incluido es bastante clara y se puede seguir. ¿Has podido sobre tus dos discursos utilizar la misma lógica? La forma de salir de dudas, para empezar, es que adjuntaras el código que estás empleando por ver si hay algún error evidente. Aunque la forma adecuada para que te podamos ayudar es con un ejemplo reproducible: código + datos.

Problem with Snowball & RWeka

2011 Mar 24

Problem with Snowball & RWeka

Dear Forum, when I try to use SnowballStemmer() I get the following error message: "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException" It seems to have something to do with either Snowball or RWeka, however I can't figure out, what to do myself. If you could spend 5 minutes of your valuable time, to help me or give me a

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

...nv(enc2utf8(info.13), sub="byte") >>> informes<-c(info.05, info.13) >>> corpus<-Corpus(VectorSource(informes)) >>> inspect(corpus[1:2]) >> <<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>> >> >> [[1]] >> <<PlainTextDocument (metadata: 7)>> >> Derecho a la seguridad ciudadana. Toda persona tiene derecho a la >> protección del Estado a través de los órganos de seguridad ciudadana >> regulados por ley, frente a situaciones que constituyan amenazas, >> vulnerabilidad o riesgo para la integrid...

Troubles with stemming (tm + Snowball packages) under MacOS

2012 Jan 13

Troubles with stemming (tm + Snowball packages) under MacOS

...for a solution, but I have found nothing useful. Here is the full source code (all the librairies are already loaded): ------ Sys.setenv(NOAWT=TRUE) source <- ReutersSource("reuters-21578.xml", encoding="UTF-8") reuters <- Corpus(source) reuters <- tm_map(reuters, as.PlainTextDocument) reuters <- tm_map(reuters, removePunctuation) reuters <- tm_map(reuters, tolower) reuters <- tm_map(reuters, removeWords, stopwords("english")) reuters <- tm_map(reuters, removeNumbers) reuters <- tm_map(reuters, stripWhitespace) reuters <- tm_map(reuters, stemDocument)...

tm package: handling contractions

2012 Jan 27

tm package: handling contractions

I tried making a wordcloud of Obama's State of the Union address using the tm package to process the text sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp,

tm: custom reader for readPlain

2013 Jan 08

tm: custom reader for readPlain

Hello: I have a series of newspaper articles from a Canadian newspaper database (Canadian Newsstand) that look just like below. I've read through this vignette (http://cran.r-project.org/web/packages/tm/vignettes/extensions.pdf) about creating a custom reader to extract meta-data, but I can't understand how to apply this in the context of a text document, rather than in the tabular format

Loop sobre muchos data frames

2015 Apr 10

Loop sobre muchos data frames

Hola a todos! Estoy en un proyecto de text mining y por razones de los recursos con que cuento tuve que separar los archivos de texto de input del proyecto en muchos archivos pequeños. Luego de transformar cada uno de estos archivos en un corpus separado, puedo aplicar limpieza sobre cada corpus, buscar n-gramas, construir cada termDocumentMatrix y finalmente reunir todo en una sola TDM. Pero

search for: plaintextdocument