thr3ads.net - R help - [R] package tm: reading XML files [May 2012]

If this information is useful, please help other people find it:
Share via:

Ad Feelders

2012-May-29 08:03 UTC

[R] package tm: reading XML files

Dear fellow R users,

I'm using the package tm for text mining, and have a problem with 
reading in a corpus from XML files.
When I copy the example from "Introduction to the tm package" of the 
small reuters subset "crude", everything goes well, and I get a corpus
with the required meta data.
When I read in the entire reuters21578 corpus in XML format however (or 
a self-created subset thereof) the meta data is lost, and the files are 
interpreted as plain text.
I use the following command, where the indicated directory contains all 
reuters 21578 documents as separate XML files:

 > reuters21578 <-
Corpus(DirSource("C:/Data/Reuters/preprocessed"),
readerContol=list(reader=readReut21578XML))

I'm running R2.15.0 under Windows XP.

Has anybody else encountered this problem and found a cause/solution.

Best regards,

-Ad Feelders

Milan Bouchet-Valat

2012-May-29 14:56 UTC

head link

[R] package tm: reading XML files

Le mardi 29 mai 2012 ? 10:03 +0200, Ad Feelders a ?crit
:> Dear fellow R users,
> 
> I'm using the package tm for text mining, and have a problem with 
> reading in a corpus from XML files.
> When I copy the example from "Introduction to the tm package" of
the
> small reuters subset "crude", everything goes well, and I get a
corpus
> with the required meta data.
> When I read in the entire reuters21578 corpus in XML format however (or 
> a self-created subset thereof) the meta data is lost, and the files are 
> interpreted as plain text.
> I use the following command, where the indicated directory contains all 
> reuters 21578 documents as separate XML files:
> 
>  > reuters21578 <-
Corpus(DirSource("C:/Data/Reuters/preprocessed"),
> readerContol=list(reader=readReut21578XML))You have a typo in that command, "readerContol" should be
"readerControl".


My two cents

Reasonably Related Threads

Search for more reasonably related threads

R help - May 2012 - package tm: reading XML files

[R] package tm: reading XML files

[R] package tm: reading XML files

Reasonably Related Threads