I'm new to R. I'm working with the text mining package tm. I have several plain text documents in a directory, and I would like to read all the files with extension .txt in that directory into a vector, one text document per vector element. That is, v[1] would be the first document, v[2] the second, etc. I know how to read the documents into a tm Corpus, but that's not what I want to do. I would think that this kind of operation should be elementary and the first step in any text mining. Thanks, Richard -- View this message in context: http://www.nabble.com/How-to-read-plain-text-documents-into-a-vector--tp25867792p25867792.html Sent from the R help mailing list archive at Nabble.com.
Richard Liu wrote:> > I'm new to R. I'm working with the text mining package tm. I have > several plain text documents in a directory, and I would like to read all > the files with extension .txt in that directory into a vector, one text > document per vector element. That is, v[1] would be the first document, > v[2] the second, etc. > > I know how to read the documents into a tm Corpus, but that's not what I > want to do. I would think that this kind of operation should be > elementary and the first step in any text mining. >Reading in a non-structured file is not that common in R, so tm provides special methods. There is a vignette tm.pdf coming with tm that explains it on the first page. Dieter -- View this message in context: http://www.nabble.com/How-to-read-plain-text-documents-into-a-vector--tp25867792p25867914.html Sent from the R help mailing list archive at Nabble.com.
Richard Liu wrote:> I'm new to R. I'm working with the text mining package tm. I have several > plain text documents in a directory, and I would like to read all the files > with extension .txt in that directory into a vector, one text document per > vector element. That is, v[1] would be the first document, v[2] the second, > etc. > > I know how to read the documents into a tm Corpus, but that's not what I > want to do. I would think that this kind of operation should be elementary > and the first step in any text mining. > > Thanks, > Richard >Hi Richard, Try somthing along these lines: file_list = list.files("/where/are/the/files") obj_list = lapply(file_list, FUN = yourfunction) yourfunction is probably either read.table or some read function from the tm package. So obj_list will become a list of either data.frame's or tm objects. cheers, Paul -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul