Displaying 1 result from an estimated 1 matches for "containingthese".
2011 Feb 09
1
Not separating words when parsing HTML in Omega
We noticed, when indexing a Word 2007 document, that two words in
adjacent paragraphs got melded together in the Xapian database. For
example:
To find the document containing
these two paragraphs...
...you would search for "containingthese".
I fixed it locally by adding a "dump.append(" ");" just before the
return in process_text() in myhtmlparse.cc. Thought I'd mention it to
see if anyone could put in a better/more permanent fix.
I could send a sample document that produces the error, if that helps.
-...