Displaying 14 results from an estimated 14 matches for "plaintextdocument".
2009 Nov 01
4
convert list to Dataframe
Hi. I have a huge list called twitter:
> dim(twitter)
NULL
> str(twitter)
List of 1
$ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic
[1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For
Governance From Campaigner-in-chief: President obama jumps campaign 09
tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
12210;10:47:37;20;10...
2015 Apr 10
5
Loop sobre muchos data frames
...quot;
si extraigo solamente el primer objeto de esa lista:
> txt[[1]]
<<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>
si quiero ver el contenido del primer copus
> inspect(txt[[1]])
<<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>
[[1]]
<<PlainTextDocument (metadata: 7)>>
qB001.txt
me informa cosas sobre el objeto, pero los datos no están allí... debería
mostrarme algo así como:
inspect(cbD02[1:1]) #inspecciono el corpus cbD120, creado a mano por la
sentencia cbD120<-Corpus(VectorSource(qT120))
#......contenido del corpus......
I went to...
2012 Jan 13
3
tm package, custom reader
I need help with creating custom xml reader for use with the tm package. The
objective is to crate a corpus for analysis. Files that I'm working with
come from solr and are in a funky XML format never the less I'm able to
parse the XML files using solrDocs.R function provided by Duncan Temple
Lang.
The problem I'm having that once I parse the document I need to create a
custom
2015 Apr 12
2
Loop sobre muchos data frames
...> <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>
>>
>> si quiero ver el contenido del primer copus
>>
>> > inspect(txt[[1]])
>> <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>
>>
>> [[1]]
>> <<PlainTextDocument (metadata: 7)>>
>> qB001.txt
>>
>> me informa cosas sobre el objeto, pero los datos no están allí... debería
>> mostrarme algo así como:
>>
>> inspect(cbD02[1:1]) #inspecciono el corpus cbD120, creado a mano por la
>> sentencia cbD120<-Corpus(Vecto...
2011 Apr 06
0
Curious treatment of entities in xmlTreeParse
...quot;),
DateTimeStamp = list("function",function(x) as.POSIXlt(Sys.time(),
tz = "GMT")),
Heading = list("node", "/item/title"),
ID = list("function", function(x) tempfile()),
Origin = list("node", "/item/link")),
doc = PlainTextDocument())
rss2Source <- function(x, encoding = "UTF-8")
XMLSource(x, function(tree)
XML::getNodeSet(XML::xmlRoot(tree),"/rss/channel/item"), rss2Reader,
encoding)
feed.rss2 <- rss2Source(url("http://scottbw.wordpress.com/feed/"))
corp1<-Corpus(feed.rss2, readerCo...
2013 Jan 15
0
Function failure in tm
...kage tm (that Milan Bouchet-Vallat has been instrumental in producing).
I can get it to produce a corpus of class:
"VCorpus" "Corpus" "list"
class(mycorp[1]) returns
"VCorpus" "Corpus" "list"
and class(mycorp[[1]] returns
"PlainTextDocument" "TextDocument" "character"
But now that I've got a corpsu, none of the transformation functions work at all. They all return the following error (with the respective function name)
Error in UseMethod("stripWhitespace", x) :
no applicable method fo...
2012 Jan 08
2
cannot find package in Packages>>Install Packages
...(twitter)
>>>
>>
> This looks to have been converted into an R object through soe process on
> some unspecified input. You should describe that process, and hte only
> unambiguous method of doing so is by including the code.
>
>
> List of 1
>> $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic
>> [1:35575] 11999;10:47:14;20;10;2009;**ObamaLouverture;Trails Mixed
>> Lessons For
>> Governance From Campaigner-in-chief: President obama jumps campaign 09
>> tuesday.. http://bit.ly/2eHMaN;Florida;**USA;FL;;;27.6...
2014 Jul 28
2
wordcloud y tabla de palabras
Hola,
La referencia (gracias por proporcionarla) que has incluido es bastante
clara y se puede seguir.
¿Has podido sobre tus dos discursos utilizar la misma lógica?
La forma de salir de dudas, para empezar, es que adjuntaras el código que
estás empleando por ver si hay algún error evidente. Aunque la forma
adecuada para que te podamos ayudar es con un ejemplo reproducible: código
+ datos.
2011 Mar 24
2
Problem with Snowball & RWeka
Dear Forum,
when I try to use SnowballStemmer() I get the following error message:
"Could not initialize the GenericPropertiesCreator. This exception was
produced: java.lang.NullPointerException"
It seems to have something to do with either Snowball or RWeka, however I
can't figure out, what to do myself. If you could spend 5 minutes of your
valuable time, to help me or give me a
2014 Jul 29
2
wordcloud y tabla de palabras [Avanzando]
...nv(enc2utf8(info.13), sub="byte")
>>> informes<-c(info.05, info.13)
>>> corpus<-Corpus(VectorSource(informes))
>>> inspect(corpus[1:2])
>> <<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>>
>>
>> [[1]]
>> <<PlainTextDocument (metadata: 7)>>
>> Derecho a la seguridad ciudadana. Toda persona tiene derecho a la
>> protección del Estado a través de los órganos de seguridad ciudadana
>> regulados por ley, frente a situaciones que constituyan amenazas,
>> vulnerabilidad o riesgo para la integrid...
2012 Jan 13
4
Troubles with stemming (tm + Snowball packages) under MacOS
...for a solution, but I have found nothing
useful.
Here is the full source code (all the librairies are already loaded):
------
Sys.setenv(NOAWT=TRUE)
source <- ReutersSource("reuters-21578.xml", encoding="UTF-8")
reuters <- Corpus(source)
reuters <- tm_map(reuters, as.PlainTextDocument)
reuters <- tm_map(reuters, removePunctuation)
reuters <- tm_map(reuters, tolower)
reuters <- tm_map(reuters, removeWords, stopwords("english"))
reuters <- tm_map(reuters, removeNumbers)
reuters <- tm_map(reuters, stripWhitespace)
reuters <- tm_map(reuters, stemDocument)...
2012 Jan 27
2
tm package: handling contractions
I tried making a wordcloud of Obama's State of the Union address using
the tm package to process the text
sotu <- scan(file="c:/R/data/sotu2012.txt", what="character")
sotu <- tolower(sotu)
corp <-Corpus(VectorSource(paste(sotu, collapse=" ")))
corp <- tm_map(corp, removePunctuation)
corp <- tm_map(corp, stemDocument)
corp <- tm_map(corp,
2013 Jan 08
1
tm: custom reader for readPlain
Hello:
I have a series of newspaper articles from a Canadian newspaper database (Canadian Newsstand) that look just like below.
I've read through this vignette (http://cran.r-project.org/web/packages/tm/vignettes/extensions.pdf) about creating a custom reader to extract meta-data, but I can't understand how to apply this in the context of a text document, rather than in the tabular format
2015 Apr 10
3
Loop sobre muchos data frames
Hola a todos!
Estoy en un proyecto de text mining y por razones de los recursos con que
cuento tuve que separar los archivos de texto de input del proyecto en
muchos archivos pequeños.
Luego de transformar cada uno de estos archivos en un corpus separado,
puedo aplicar limpieza sobre cada corpus, buscar n-gramas, construir cada
termDocumentMatrix y finalmente reunir todo en una sola TDM.
Pero