Jiayue Wang
2018-Dec-09 06:23 UTC
[R] "subscript out of bounds" error when using koRpus+Tree Tagger
Hi, I'm trying to do text corpus processing on some novels, with koRpus package and Tree Tagger. The script lists all txt files (11 in all) in a dir, and processes it one by one. ########## rm(list=ls()) library(koRpus) library(koRpus.lang.en) set.kRp.env(TT.cmd = "/pathto/tree-tagger-english", lang = "en") outdir <- "/pathto/corpora" corpdir <- paste0(outdir,"/","morrison11") files <- list.files(path=corpdir, pattern = "*.txt", full.names = F) n <- length(files) output <- file(paste0(outdir,"/calc_results_morrison11.txt"), open="at") for (i in 1:n) { cat(i," - ",files[i],"\n", file = output) tagged.results <- treetag(paste0(corpdir,'/',files[i]), treetagger="kRp.env") capture.output(flesch(tagged.results), file = output) cat("\n", file=output) capture.output(TTR(tagged.results), file = output) cat("\n", file=output) capture.output(textFeatures(tagged.results), file=output) cat("\n===========================\n", file = output) } close(output) ######### The problem is, the script always throws the following error when it works on the last txt file and prematurely exits: ??Error in all.patterns[[word.length]] : subscript out of bounds I can't figure out what this message means. the dir's are correct; there's no problem with Tree Tagger installation; n and files have the correct values. Please help, many thanks! Jiayue