thr3ads.net - R help - [R] Efficiently Extracting Meta Data from TM Corpora [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Shad Thomas

2009-Aug-13 17:14 UTC

[R] Efficiently Extracting Meta Data from TM Corpora

I'm using text miner (the "tm" package) to process large numbers
of blog and message board postings (about 245,000). Does anyone have any advice
for how to efficiently extract the meta data from a corpus of this size?

TM does a great job of using MPI for many functions (e.g. tmMap) which greatly
speed up the processing. However, the "meta" function that I need does
not take advantage of MPI.

I have two ideas: 
1) Find a way of running the meta function in parallel mode. Specifically, the
code that I'm running is:
urllist <- lapply(workingcorpus, meta, tag = "FeedUrl") 
Unfortunately, I receive the following error message when I try to use the
command "parLapply"
"Error in checkCluster(cl) : not a valid cluster 
Calls: parLapply ... is.vector -> clusterApply -> staticClusterApply ->
checkCluster"

2) Alternatively, I wonder if there might be a way of extracting all of the meta
data into a data.frame that would be faster for processing?

Thanks for any suggestions or ideas! 
Shad 


shad thomas | president | glass box research company | +1 (312) 451-3611 tel |
shad.thomas@glassboxresearch.com | www.glassboxresearch.com

	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Aug 2009 - Efficiently Extracting Meta Data from TM Corpora

[R] Efficiently Extracting Meta Data from TM Corpora

Seemingly Similar Threads

Wisdom of the Ancients