I'm running tm 0.5 on R 2.9.2 on a MacBook Pro 17" unibody early 2009 2.93 GHz 4GB RAM. I have a directory with 1697 plain text files on the Mac, that I want to analyze with the tm package. I have read the documents into a corpus, Corpus_3compounds, as follows: # Assign directory to a character vector dirName <- "/Volumes/RDR Test Documents/3Compounds/TXT" # Put the paths of the .txt files in the directory into a vector Files_3compounds <- dir(dirName, full.names = TRUE, pattern = "_.*\\.txt", ignore.case = TRUE) # Use that vector to create a DirSource object Dir_3compounds <- DirSource(dirName, pattern = "_.*\\.txt", ignore.case = TRUE, encoding = "latin1") # Read the .txt files into a volatile corpus Corpus_3compounds <- Corpus(Dir_3compounds, readerControl = list(reader = readPlain, language = "en", load = TRUE)) I have the metadata for these text documents in an Excel table, which I have read into Metadata_3compounds as follows: # Read the metadata into a data frame Metadata_3compounds <- read.xls("/Volumes/RDR Test Documents/ 3Compounds/3compounds.xls", sheet = 3, verbose = TRUE, pattern = "Document", method = "tab", perl ="perl") Since the metadata and the text documents in the corpus are not in the same order, I have to create an index between the two. Basically, the filename contains the document ID. # Index of the metadata for a document in the corpus in Metadata_3compounds iMyMetadata <- match(gsub("^(.*)/_(.*)\\.txt$", "\\2", Files_3compounds, perl = TRUE), Metadata_3compounds$Document.No) The metadata dataframe has the following names: [1] "Document.No" ... [5] ... [9] "total" "SET" "CAT1" "CAT2" [13] "Title" "Approved.By" "Author.s." "Center" [17] "Comment" "Date.Approved" "Date.Submitted" "Department" [21] "Division" "Document.Class" "Document.Date" "Document.No.1" [25] "Language" "Pages" "Project.ID..Theme.Number." "Rapid.Document" [29] "Report.No" "Study.Protocol.No" "Submitted.By" "Substance.ID" Now I want to assign this metadata to the local metadata of the documents in the corpus, for example as follows: # Transfer metadata to local meta(Corpus_3compounds, type = "local", tag = "DocId") <- Metadata_3compounds$Document.No[iMyMetadata] I have let this statement run for more than twenty minutes before deciding to stop it I just cannot imagine that it should take anywhere near as long. If I assign the same vector to the indexed metadata of the corpus instead, it finishes in just a bit more than a blink of an eye. When I limit the number of documents to five I can verify that the code is correct. QUESTIONS: Is it normal for this operation to take so long on a corpus of 1697 documents? Is there a quicker way of accomplishing the same thing? I really do want to store the metadata with the document, i.e., as local metadata. I am uncertain about the advantages, but I would think that, if I delete or filter out a document, the metadata is deleted or filtered as well. Furthermore, when I cluster the documents or train a machine learner on them, I could imagine -- but I do not know for sure -- that it might be easier to use local metadata as a feature, whereas that might not be so easy with indexed metadata. Regards, Richard Liu