search for: readdoc

Displaying 7 results from an estimated 7 matches for "readdoc".

2009 Aug 17
2
reading in MS Word files
I am familiar with packages that read and write Excel files on both Windows and Linux platforms. Do any packages provide similar functionality for MS Word files? I have a lot of text processing to do and the text is embedded in ~200 different Word files (.doc format Office 2003). All I need to do is read, not write. Thanks, Mark ------------------------------------------------------------ Mark
2009 Aug 05
2
reading and frequency analysis of Spanish text
For an historical paper I'm working on, I have some Spanish plaintext, presently in the form of a Word .doc file, http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc and also some ciphered text from the same original source. The ultimate goal is to use some frequency analysis of letters and word lengths in the plaintext to help decode the
2013 Apr 28
3
Dovecot Solr Panic
...run(QueuedThreadPool.java:582) Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1188418,131] Message: Premature end of file. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:592) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:273) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) ... 22 more ---8<-------------------------------------------------------------------------------------------------------...
2009 Dec 11
0
readHTML within tm package
...uments. In the documentation and in the the tutorial material it says that there is a readHTML routine that can be used to read HTML documents into a corpus. However, when I try to use that routine I get an error. When I run getReaders (below) readHTML isn't listed. > getReaders() [1] "readDOC" "readGmane" [3] "readPDF" "readReut21578XML" [5] "readReut21578XMLasPlain" "readPlain" [7] "readRCV1" "readTabular" I'm a missing some...
2010 Feb 04
1
How to read HTML or TEXT file with tm package
??????????????????????????????????????????... ????: ???? URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100204/a3069c99/attachment.pl>
2012 Nov 18
4
panic fts_solr for bad attachment
...eamScanner.loadMore(StreamScanner.java:994) at com.ctc.wstx.sr.StreamScanner.getNext(StreamScanner.java:754) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2691) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1065) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) ... 19 more Caused by: java.io.CharConversionException: Invalid UTF-8 start byte 0xfc (at char #25214836, byte #26687495) at com.ctc.wstx.io.UT...
2012 Dec 31
5
2.1.12: Panic: file solr-connection.c: line 547 (solr_connection_post_more)
...18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:315) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apac...