similar to: Efficiently Extracting Meta Data from TM Corpora

Displaying 20 results from an estimated 200 matches similar to: "Efficiently Extracting Meta Data from TM Corpora"

2009 Dec 24
serving crossdomain.xml from icecast web-root
Eric, David The way to do that is by using something called a proxy script. Simply the code goes out to your source server, reads the xml, and then relays it on your local server. You can find more information about this by going to: Or if you wanted to just try a simple one, you can create a file called proxy.php and paste this code into it:
2012 Jun 12
Fwd: [Corpora-List] ACM SIGIR 2012 Workshop on Open Source Information Retrieval
This might be an interesting option for some of you! Regards, Parth. ---------- Forwarded message ---------- From: Andrew Trotman <andrew at> Date: Tue, Jun 12, 2012 at 5:12 AM Subject: [Corpora-List] ACM SIGIR 2012 Workshop on Open Source Information Retrieval To: corpora at ACM SIGIR 2012 WORKSHOP ON OPEN SOURCE INFORMATION RETRIEVAL**** 16 August 2012, Portland,
2009 Oct 15
Problems with rJava and tm packages
I am looking to do some text analysis using R and have run into some issues with some of the packages. Im not sure if its my goofy Vista OS or what but using R 2.8.1 i s relatively successful loading the text but the rJava package was messed up somehow: library(tm) > library(rJava) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be determined from the
2011 Sep 02
Classifying large text corpora using R
Dear everyone, I am new to R, and I am looking at doing text classification on a huge collection of documents (>500,000) which are distributed among 300 classes (so basically, this is my training data). Would someone please be kind enough to let me know about the R packages to use and their scalability (time and space)? I am very new to R and do not know of the right packages to use. I
2009 Jan 15
How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)
Hi, Gurus Thanks to your good helps, I have managed starting the use of a text mining package so called "tm" in R under the OS of Win XP. However, during running the tm package, I got another mine like memory problem. What is a the best way to solve this memory problem among increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R
2005 Dec 08
perl module
Hi list, I have trouble install the following perl modules to my Centos 4.2 server and I need help Digest::SHA1 Digest::HMAC Net::DNS Time::HiRes HTML::Tagset HTML::Parser Pod::Usage Parse::Syslog Statistics::Distributions I tried "perl -MCPAN -e shell" and then "install Pod::Usage" and I got the following errors: CPAN: Storable loaded ok Going to read /root/.cpan/Metadata
2011 Nov 17
merging corpora and metadata
Greetings! I loose all my metadata after concatenating corpora. This is an example of what happens: > meta(corpus.1) MetaID cid fid selfirst selend fname 1 0 1 11 2169 2518 WCPD-2001-01-29-Pg217.scrb 2 0 1 14 9189 9702 WCPD-2003-01-13-Pg39.scrb 3 0 1 14 2109 2577 WCPD-2003-01-13-Pg39.scrb .... .... 17 0
2007 Nov 05
Ajax - attachment_fu - rmagick and iframe
Hello I am uploading images ajax-attachment_fu-rmagick and an iframe and me climb correctly, but when I finish the process and the driver will I redirect_to enc_url I recharge layer ajax. I tried with redirect_to enc_url :partial => ''m_content'' and redirect_to enc_url, :update => ''m_content''. But neither I recharge ... rhtml ------ <% form_for(photo,
2018 Mar 14
clusterApply arguments
This is nothing specific to parallel::clusterApply() per se. It is the default behavior of R where it allows for partial argument names. I don't think there's much that can be done here except always using fully named arguments to the "apply" function itself as you show. You can "alert" yourself when there's a mistake by using: options(warnPartialMatchArgs =
2018 Mar 15
clusterApply arguments
On Thu, Mar 15, 2018 at 3:39 AM, <FlorianSchwendinger at> wrote: > Thank you for your answer! > I agree with you except for the 3 (Error) example and > I realize now I should have started with that in the explanation. > > From my point of view > parLapply(cl = clu, X = 1:2, fun = fun, c = 1) > shouldn't give an error. > > This could be easily avoided by
2018 Mar 15
clusterApply arguments
On 03/15/2018 05:25 PM, Henrik Bengtsson wrote: > On Thu, Mar 15, 2018 at 3:39 AM, <FlorianSchwendinger at> wrote: >> Thank you for your answer! >> I agree with you except for the 3 (Error) example and >> I realize now I should have started with that in the explanation. >> >> From my point of view >> parLapply(cl = clu, X = 1:2, fun = fun, c =
2018 Mar 15
clusterApply arguments
Thank you for your answer! I agree with you except for the 3 (Error) example and I realize now I should have started with that in the explanation. >From my point of view parLapply(cl = clu, X = 1:2, fun = fun, c = 1) shouldn't give an error. This could be easily avoided by using all the argument names in the custerApply call of parLapply which means changing, parLapply <-
2010 Dec 17
[R-sig-hpc] Error in makeMPIcluster(spec, ...): how to get a minimal example for parallel computing with doSNOW to run?
Shouldn't -n be 4 in the bsub command? One master+3 slaves. This was required for snowfall, but I think doSNOW is similar. Hope it helps mario On 16-Dec-10 23:09, Marius Hofert wrote: > Dear expeRts, > > I try to get a minimal example for parallel computing via "foreach" + "doSNOW" to run on a computer cluster (Brutus from
2018 Mar 14
clusterApply arguments
Hi! I recognized that the argument matching of clusterApply (and therefore parLapply) goes wrong when one of the arguments of the function is called "c". In this case, the argument "c" is used as cluster and the functions give the following error message "Error in checkCluster(cl) : not a valid cluster". Of course, "c" is for many reasons an unfortunate
2008 Mar 27
snow, stopping cluster
Hello, is there any function in the package snow to check for a really running cluster? The function checkCluster only checks the variable cl. And the variable is still available after stopping the cluster! ( a simple solution would be deleting the cluster variable cl in the function stopCluster) > library(snow) > cl <- makeCluster(5) 5 slaves are spawned successfully. 0
2006 Apr 28
available: google sitemap for rails project
Hi, Google Sitemaps ( is a way to help Google''s crawler on your website. I''ve published a little script to generate, from a rails project, a urllist file usable with the Google sitemap generator. You drop the script in your RAILS_ROOT/lib directory, edit it to set the base url, and you''re set. Run it with ruby script/runner
2024 Mar 25
Wish: a way to track progress of parallel operations
Hello R-devel, A function to be run inside lapply() or one of its friends is trivial to augment with side effects to show a progress bar. When the code is intended to be run on a 'parallel' cluster, it generally cannot rely on its own side effects to report progress. I've found three approaches to progress bars for parallel processes on CRAN: - Importing 'snow' (not
2009 Jul 17
Ayuda con el paquete de text mining (TM)
Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones
2024 Mar 25
Wish: a way to track progress of parallel operations
Hello, thanks for bringing this topic up, and it would be excellent if we could come of with a generic solution for this in base R. It is one of the top frequently asked questions and requested features in parallel processing, but also in sequential processing. We have also seen lots of variants on how to attack the problem of reporting on progress when running in parallel. As the author
2009 Sep 15
S3 objects in S4 slots
Hello, I am the maintainer of the stringkernels package and have come across a problem with using S3 objects in my S4 classes. Specifically, I have an S4 class with a slot that takes a text corpus as a list of character vectors. tm (version 0.5) saves corpora as lists with a class attribute of c("VCorpus", "Corpus", "list"). I don't actually need the