Walter Rojas
2007-Aug-18 22:31 UTC
[R] Problem with lsa package (data.frame) on Windows XP
Dear R team, The following piece of code (to use the lsa package) works fine on my mac os x, but when I run the same code on Windows XP, it doesn't work any more. ### code: library("lsa") matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL) print(matrix1,bag_lines = 3, bag_cols = 3) matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) space = lsa(matrix1, dims = dimcalc_share()) as.textmatrix(space) ### the following line fails on windows XP matrix2 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\respuestas\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1, stopwords=NULL,vocabulary=rownames (matrix1)) matrix2 = lw_bintf(matrix2) matrix2fld = fold_in(matrix2, space) r <- cor(matrix2fld[,"respId1.txt"], matrix2fld[,"respAl1.txt"], method = "pearson") print(r) An error occurs when creating the second textmatrix with the vocabulary of the first. The error I get is: in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : arguments imply differing number of rows: 1, 0 When I change the vocabulary argument to NULL, it doesn't report this error any more; however, then the code will fail on the fold_in method further down. I found another user who reported this same problem on-line; however, I didn't find any answers. Thank you very much in advance for your reply. Tine.
Please specify reproducible examples, it is almost impossible to help otherwise. Also, please provide all error messages and a traceback(). Please tell us versions of R and versions of the packages you are using. If you are sure this is an error in the package, please send that reproducible example to the package maintainer. Uwe Ligges Walter Rojas wrote:> Dear R team, > > The following piece of code (to use the lsa package) works fine on my > mac os x, but when I run the same code on Windows XP, it doesn't work > any more. > > ### code: > library("lsa") > matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. > 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", > minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL) > print(matrix1,bag_lines = 3, bag_cols = 3) > matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) > space = lsa(matrix1, dims = dimcalc_share()) > as.textmatrix(space) > > ### the following line fails on windows XP > matrix2 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. > 000\\LSA\\respuestas\\", stemming=TRUE, language="spanish", > minWordLength=2, minDocFreq=1, stopwords=NULL,vocabulary=rownames > (matrix1)) > matrix2 = lw_bintf(matrix2) > matrix2fld = fold_in(matrix2, space) > r <- cor(matrix2fld[,"respId1.txt"], matrix2fld[,"respAl1.txt"], > method = "pearson") > print(r) > > > An error occurs when creating the second textmatrix with the > vocabulary of the first. The error I get is: > > in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : > arguments imply differing number of rows: 1, 0 > > When I change the vocabulary argument to NULL, it doesn't report this > error any more; however, then the code will fail on the fold_in > method further down. > > I found another user who reported this same problem on-line; however, > I didn't find any answers. > > Thank you very much in advance for your reply. > Tine. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Tine Stalmans
2007-Aug-20 18:33 UTC
[R] Problem with lsa package (data.frame) on Windows XP
Dear Uwe, Thanks very much for your prompt reply. I include the following pieces of information, alongside a zip file with two folders where the corpus resides. ############################### ##Full reproducible code: ################################ library("lsa") # load training text matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE.000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL) print(matrix1,bag_lines = 3, bag_cols = 3) matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) # weighting space = lsa(matrix1, dims = dimcalc_share()) # create LSA space #as.textmatrix(space) # fold-in test and gold standard essays matrix2 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE.000\\LSA\\respuestas\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1, stopwords=NULL,vocabulary=rownames(matrix1)) matrix2 = lw_bintf(matrix2) # da NaN si se agrega el idf porque divide entre 0 matrix2fld = fold_in(matrix2, space) r <- cor(matrix2fld[,"respId1.txt"], matrix2fld[,"respAl1.txt"], method = "pearson") #use = "complete.obs", method = "pearson"); print(r) ###################### #end code ######################## I tried to run a traceback, however when including this command in the code, it didn't change the original error message. ########################### #R output, including error message: ###################################>source("C:\\Documents and Settings\\tine stalmans.TINE.000\\LSA\\lsa.R")$matrix D1 D2 D3 D8 D9 D10 D13 D14 D15 1. 11 1 0 0 0 0 0 0 0 0 2. 1493 1 0 0 0 0 0 0 0 0 3. 1503 1 0 0 0 0 0 0 0 0 896. voy 0 0 0 0 2 0 1 0 0 897. vuelv 0 0 0 0 0 0 0 0 0 898. yo 0 0 0 0 0 0 0 0 0 1790. unic 0 0 0 0 0 0 0 0 1 1791. verific 0 0 0 0 0 0 0 0 1 1792. vier 0 0 0 0 0 0 0 0 1 $legend [1] "D1 = paraR_1.txt" "D2 = paraR_10.txt" "D3 = paraR_11.txt" [4] "D8 = paraR_2.txt" "D9 = paraR_3.txt" "D10 = paraR_4.txt" [7] "D13 = paraR_7.txt" "D14 = paraR_8.txt" "D15 = paraR_9.txt" Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : arguments imply differing number of rows: 1, 0 In addition: There were 16 warnings (use warnings() to see them) ########################## #end output ############################## R version: R 2.5.1 (running on Windows XP) LSA package: lsa_0.57 Rstem package 0.3-0 (available at www.omagehat.org/Rstem/) Thanks in advance for your advice. Tina. >From: "Uwe Ligges" <ligges at statistik.uni-dortmund.de> >To: "Walter Rojas" <walterrojas at mac.com> >Cc: <r-help at stat.math.ethz.ch> >Date: August 19, 2007 08:45:28 AM PDT >Subject: Re: [R] Problem with lsa package (data.frame) on Windows XP > >Please specify reproducible examples, it is almost impossible to help >otherwise. Also, please provide all error messages and a traceback(). >Please tell us versions of R and versions of the packages you are using. >If you are sure this is an error in the package, please send that >reproducible example to the package maintainer. > >Uwe Ligges > > >Walter Rojas wrote: >> Dear R team, >> >> The following piece of code (to use the lsa package) works fine on my >> mac os x, but when I run the same code on Windows XP, it doesn't work >> any more. >> >> ### code: >> library("lsa") >> matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. >> 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", >> minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL) >> print(matrix1,bag_lines = 3, bag_cols = 3) >> matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) >> space = lsa(matrix1, dims = dimcalc_share()) >> as.textmatrix(space) >> >> ### the following line fails on windows XP >> matrix2 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. >> 000\\LSA\\respuestas\\", stemming=TRUE, language="spanish", >> minWordLength=2, minDocFreq=1, stopwords=NULL,vocabulary=rownames >> (matrix1)) >> matrix2 = lw_bintf(matrix2) >> matrix2fld = fold_in(matrix2, space) >> r <- cor(matrix2fld[,"respId1.txt"], matrix2fld[,"respAl1.txt"], >> method = "pearson") >> print(r) >> >> >> An error occurs when creating the second textmatrix with the >> vocabulary of the first. The error I get is: >> >> in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : >> arguments imply differing number of rows: 1, 0 >> >> When I change the vocabulary argument to NULL, it doesn't report this >> error any more; however, then the code will fail on the fold_in >> method further down. >> >> I found another user who reported this same problem on-line; however, >> I didn't find any answers. >> >> Thank you very much in advance for your reply. >> Tine. >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > _________________________________________________________________ Descubre la descarga digital con MSN Music. M?s de un mill?n de canciones.