Displaying 3 results from an estimated 3 matches for "readreut21578xml".
2012 May 29
1
package tm: reading XML files
...he meta data is lost, and the files are
interpreted as plain text.
I use the following command, where the indicated directory contains all
reuters 21578 documents as separate XML files:
> reuters21578 <- Corpus(DirSource("C:/Data/Reuters/preprocessed"),
readerContol=list(reader=readReut21578XML))
I'm running R2.15.0 under Windows XP.
Has anybody else encountered this problem and found a cause/solution.
Best regards,
-Ad Feelders
2009 Dec 11
0
readHTML within tm package
...be used to read HTML documents into a corpus.
However, when I try to use that routine I get an error. When I run
getReaders (below) readHTML isn't listed.
> getReaders()
[1] "readDOC" "readGmane"
[3] "readPDF" "readReut21578XML"
[5] "readReut21578XMLasPlain" "readPlain"
[7] "readRCV1" "readTabular"
I'm a missing something? Is there an extra install I'm missing, or has the
routine been removed or replaced?
Thanks, Peter
Oh, yes, ru...
2010 Feb 04
1
How to read HTML or TEXT file with tm package
??????????????????????????????????????????...
????: ????
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100204/a3069c99/attachment.pl>