Displaying 7 results from an estimated 7 matches for "readdoc".
2009 Aug 17
2
reading in MS Word files
I am familiar with packages that read and write Excel files on both Windows
and Linux platforms.
Do any packages provide similar functionality for MS Word files? I have a
lot of text processing to do and the text is embedded in ~200 different Word
files (.doc format Office 2003). All I need to do is read, not write.
Thanks,
Mark
------------------------------------------------------------
Mark
2009 Aug 05
2
reading and frequency analysis of Spanish text
For an historical paper I'm working on, I have some Spanish plaintext,
presently in the form of a Word .doc
file,
http://euclid.psych.yorku.ca/SCS/Gallery/images/Private/Langren/Verdadera-spanish-stripped.doc
and also some ciphered text from the same original source. The ultimate
goal is to use some
frequency analysis of letters and word lengths in the plaintext to help
decode the
2013 Apr 28
3
Dovecot Solr Panic
...run(QueuedThreadPool.java:582)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1188418,131]
Message: Premature end of file.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:592)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:273)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
... 22 more
---8<-------------------------------------------------------------------------------------------------------...
2009 Dec 11
0
readHTML within tm package
...uments. In the
documentation and in the the tutorial material it says that there is a
readHTML routine that can be used to read HTML documents into a corpus.
However, when I try to use that routine I get an error. When I run
getReaders (below) readHTML isn't listed.
> getReaders()
[1] "readDOC" "readGmane"
[3] "readPDF" "readReut21578XML"
[5] "readReut21578XMLasPlain" "readPlain"
[7] "readRCV1" "readTabular"
I'm a missing some...
2010 Feb 04
1
How to read HTML or TEXT file with tm package
??????????????????????????????????????????...
????: ????
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100204/a3069c99/attachment.pl>
2012 Nov 18
4
panic fts_solr for bad attachment
...eamScanner.loadMore(StreamScanner.java:994)
at com.ctc.wstx.sr.StreamScanner.getNext(StreamScanner.java:754)
at
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2691)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1065)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
... 19 more
Caused by: java.io.CharConversionException: Invalid UTF-8 start byte
0xfc (at char #25214836, byte #26687495)
at com.ctc.wstx.io.UT...
2012 Dec 31
5
2.1.12: Panic: file solr-connection.c: line 547 (solr_connection_post_more)
...18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:315)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at org.apac...