I am familiar with packages that read and write Excel files on both Windows and Linux platforms. Do any packages provide similar functionality for MS Word files? I have a lot of text processing to do and the text is embedded in ~200 different Word files (.doc format Office 2003). All I need to do is read, not write. Thanks, Mark ------------------------------------------------------------ Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail "The real problem is not whether machines think but whether men do." -- B. F. Skinner ****************************************************************** [[alternative HTML version deleted]]
Hello, On 8/17/09, Mark Kimpel <mwkimpel at gmail.com> wrote:> I am familiar with packages that read and write Excel files on both Windows > and Linux platforms. > > Do any packages provide similar functionality for MS Word files? I have a > lot of text processing to do and the text is embedded in ~200 different Word > files (.doc format Office 2003). All I need to do is read, not write. >Perhaps export to .txt or .html via OpenOffice and then read and process in R? Liviu
On Tue, Aug 18, 2009 at 12:00:07PM +0200, Mark Kimpel wrote:> I am familiar with packages that read and write Excel files on both Windows > and Linux platforms. > > Do any packages provide similar functionality for MS Word files? I have a > lot of text processing to do and the text is embedded in ~200 different Word > files (.doc format Office 2003). All I need to do is read, not write.See readDOC in package tm. E.g., something like Corpus(DirSource("aDirectoryContainingTheWordFiles"), readerControl = list(reader = readDOC)) Note that you need antiword (http://www.winfield.demon.nl/) in your path such that readDOC can use it. Best regards, Ingo -- Ingo Feinerer Vienna University of Technology http://www.dbai.tuwien.ac.at/staff/feinerer