Julien Velcin
2012-Jan-13 14:49 UTC
[R] Troubles with stemming (tm + Snowball packages) under MacOS
Dear all, I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages. Here is my config: MacOS 10.5 R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) with : Sys.setenv(NOAWT=TRUE) The command tm_map(reuters, stemDocument) gives the following errors : - First time: Error in .jnew(name) : java.lang.InternalError: Can't start the AWT because Java was started on the first thread. Make sure StartOnFirstThread is not specified in your application's Info.plist or on the command line Refreshing GOE props... - Second time: Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! (etc.) I have already search the Web for a solution, but I have found nothing useful. Here is the full source code (all the librairies are already loaded): ------ Sys.setenv(NOAWT=TRUE) source <- ReutersSource("reuters-21578.xml", encoding="UTF-8") reuters <- Corpus(source) reuters <- tm_map(reuters, as.PlainTextDocument) reuters <- tm_map(reuters, removePunctuation) reuters <- tm_map(reuters, tolower) reuters <- tm_map(reuters, removeWords, stopwords("english")) reuters <- tm_map(reuters, removeNumbers) reuters <- tm_map(reuters, stripWhitespace) reuters <- tm_map(reuters, stemDocument) ------ Thank you for your help, Julien
Milan Bouchet-Valat
2012-Jan-15 14:52 UTC
[R] Troubles with stemming (tm + Snowball packages) under MacOS
Le vendredi 13 janvier 2012 ? 15:49 +0100, Julien Velcin a ?crit :> Dear all, > > I have some troubles using the stemming algorithm provided by the tm > (text mining) + Snowball packages. > Here is my config: > > MacOS 10.5 > R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) > > I have installed all the needed packages (tm, rJava, rWeka, Snowball) > + dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) > with : > > Sys.setenv(NOAWT=TRUE) > > The command tm_map(reuters, stemDocument) gives the following errors : > > - First time: > Error in .jnew(name) : > java.lang.InternalError: Can't start the AWT because Java was > started on the first thread. Make sure StartOnFirstThread is not > specified in your application's Info.plist or on the command line > Refreshing GOE props...In my experience, there's no clean solution to this problem for now. There's a good workaround, though: run your code from JGR, which is a GUI written in Java. Snowball works well this way. Cheers
Julien Velcin
2012-Jan-15 22:06 UTC
[R] Troubles with stemming (tm + Snowball packages) under MacOS
I use the version 1.6: $ java -version java version "1.6.0_26" Julien On Jan 15, 2012, at 8:55 PM, Milan Bouchet-Valat wrote:> Le dimanche 15 janvier 2012 ? 16:32 +0100, Julien Velcin a ?crit : >> Unfortunately, it doesn't work. I've installed JGR and launched my >> script. I still obtain an error: >> >> Error in .jcall("RWekaInterfaces", "[S", "stem", .jcast(stemmer, >> "weka/ >> core/stemmers/Stemmer"), : >> RcallMethod: cannot determine object class >> >> Any new idea? > Just a guess, but what version of Java do you have? You can find > this in > the Java preferences panel (type "Java" in Spotlight to find it). > 1.6 is > required, and often only 1.5 is used by default on OS X. > > > Regards
Zhou Zhou
2012-Feb-02 15:36 UTC
[R] Troubles with stemming (tm + Snowball packages) under MacOS
The Sys.setenv(NOAWT=TRUE) code indeed solved my problem which was excatly what Julien described. The key is you have to deactivate AWT BEFORE loading RWeka/Snowball. If I do so it will fire a few warning messages but that should not affect anything. I am running the lsa package which requires RWeka and Snowball. My R version is 2.14.1, under Mac OS X 10.6.8. My code snippet as below:> dtm<-textmatrix(ldir,minWordLength=1,stopwords=stopwords_en,stemming=TRUE,language="english")Refreshing GOE props... ---Registering Weka Editors--- Trying to add database driver (JDBC): RmiJdbc.RJDriver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): jdbc.idbDriver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): com.mckoi.JDBCDriver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): org.hsqldb.jdbcDriver - Warning, not in CLASSPATH? [KnowledgeFlow] Loading properties and plugins... [KnowledgeFlow] Initializing KF... Julien Velcin wrote> > I have desactivated AWT (like written in > http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) > with : > > Sys.setenv(NOAWT=TRUE) > > The command tm_map(reuters, stemDocument) gives the following errors : >-- View this message in context: http://r.789695.n4.nabble.com/Troubles-with-stemming-tm-Snowball-packages-under-MacOS-tp4292605p4351779.html Sent from the R help mailing list archive at Nabble.com.
Julien Velcin
2012-Feb-05 00:30 UTC
[R] Troubles with stemming (tm + Snowball packages) under MacOS
THANK YOU ! Actually, the key is to disable AWT before loading the R packages. At last, it works with just a few warnings. Julien On Feb 2, 2012, at 4:36 PM, Zhou Zhou wrote:> The Sys.setenv(NOAWT=TRUE) code indeed solved my problem which was > excatly > what Julien described. > > The key is you have to deactivate AWT BEFORE loading RWeka/Snowball. > If I do > so it will fire a few warning messages but that should not affect > anything. > I am running the lsa package which requires RWeka and Snowball. My R > version > is 2.14.1, under Mac OS X 10.6.8. My code snippet as below: > >> dtm<- >> textmatrix >> (ldir >> ,minWordLength >> =1,stopwords=stopwords_en,stemming=TRUE,language="english") > Refreshing GOE props... > ---Registering Weka Editors--- > Trying to add database driver (JDBC): RmiJdbc.RJDriver - Warning, > not in > CLASSPATH? > Trying to add database driver (JDBC): jdbc.idbDriver - Warning, not in > CLASSPATH? > Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - > Warning, not > in CLASSPATH? > Trying to add database driver (JDBC): com.mckoi.JDBCDriver - > Warning, not in > CLASSPATH? > Trying to add database driver (JDBC): org.hsqldb.jdbcDriver - > Warning, not > in CLASSPATH? > [KnowledgeFlow] Loading properties and plugins... > [KnowledgeFlow] Initializing KF... > > > > Julien Velcin wrote >> >> I have desactivated AWT (like written in >> http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) >> with : >> >> Sys.setenv(NOAWT=TRUE) >> >> The command tm_map(reuters, stemDocument) gives the following >> errors : >> > > > -- > View this message in context: http://r.789695.n4.nabble.com/Troubles-with-stemming-tm-Snowball-packages-under-MacOS-tp4292605p4351779.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.