thr3ads.net - similar to: "[Fwd: Irix install of omega fails.]"

Displaying 20 results from an estimated 1000 matches similar to: "[Fwd: Irix install of omega fails.]"

2012 Dec 29

omindex killed

I'm finding that omindex is consistently ending prematurely when indexing certain files. The last output looks like this: [Entering directory /compounds/Acetic_acid] Indexing "/MATLAB/compounds/Acetic_acid/AACID_50T.TXT" as text/plain ... added. Indexing "/MATLAB/compounds/Acetic_acid/AACID_50T.pdf" as application/pdf ... "pdftotext -enc UTF-8

Moving indextext.cc into core.

2007 Mar 28

Moving indextext.cc into core.

One of the items on the ToDo list for version 1.0 at http://wiki.xapian.org/TodoFor1_2e0#preview is: "Rework Omega's indextext.cc as a xapian-core "TextSplitter" class." I've been wondering about this for a while now. Currently, we have the Query Parser in Xapian core, but no text processing. Clearly, it makes sense to have a "text splitter" class in

Test builds for CYGWIN and IRIX?

2004 Nov 22

Test builds for CYGWIN and IRIX?

I'm starting to prepare the next release. Since 0.8.3 I've made a number of changes to get working builds working on HPUX and OSF, and made some of the Windows specific bits more robust. I'd like to check that these haven't broken CYGWIN or IRIX builds, but I don't have access to these platforms. If you are able to test, it'd be most appreciated if you could. Download a

Getting htmlParse to work with Hebrew? (on windows)

2013 Feb 21

Getting htmlParse to work with Hebrew? (on windows)

Hello dear R-help mailing list. Looks like the same issue in Russian: library(RCurl) library(XML) u = " http://www.cian.ru/cat.php?deal_type=2&obl_id=1&room1=1" a = getURL(u) a # Here - the Russian is fine. a2 <- htmlParse(a) a2 # Here it is a mess... None of these seem to fix it: htmlParse(a, encoding = "windows-1251") htmlParse(a, encoding =

htmlParse (from XML library) working sporadically in the same code

2013 Mar 20

htmlParse (from XML library) working sporadically in the same code

I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse is http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0 Sometimes the following code works n<-readHTMLTable(htmlParse(url)) But most of the

htmlParse Error

2012 May 21

htmlParse Error

I am trying to parse a webpage using the htmlParse command in XML package as follows: library(XML) u = "http://en.wikipedia.org/wiki/World_population" doc = htmlParse(u) I get the following error: Error in htmlParse(u) : error in creating parser for http://en.wikipedia.org/wiki/World_population I am using a R 2.13.1 (32 bit version) on a 64 bit Windows. (I tried installing it in

Getting htmlParse to work with Hebrew? (on windows)

2012 Jan 30

Getting htmlParse to work with Hebrew? (on windows)

Hello dear R-help mailing list. I wish to be able to have htmlParse work well with Hebrew, but it keeps to scramble the Hebrew text in pages I feed into it. For example: # why can't I parse the Hebrew correctly? library(RCurl) library(XML) u = "http://humus101.com/?p=2737" a = getURL(u) a # Here - the hebrew is fine. a2 <- htmlParse(a) a2 # Here it is a mess... None of

Query Parser, filenames and compound words

2005 Dec 30

Query Parser, filenames and compound words

When I submit a filename to the query parser it breaks it up Example: /home/user/file_name.ext becomes Xapian::Query((home:(pos=1) PHRASE 5 user:(pos=2) PHRASE 5 file:(pos=3) PHRASE 5 name:(pos=4) PHRASE 5 ext:(pos=5))) which does not find the document. If I do an single term query not using the query parser then I find the document. The Query Parser also breaks up hyphenated terms

omindex options

2009 May 19

omindex options

Hi. I am writing a python equivalent of omindex (we are using scriptindex currently - but I wanted to use omindex instead, and extend it to work with our internal file format.. BUT did not want to compile code if possible... so anyway). I have tried to keep the code as close to possible to the omindex native code, but am facing a bit of confusion: what exactly is the reason for omindex to take

patch proposal: omindex library or daemon

2011 Oct 18

patch proposal: omindex library or daemon

Olly (looking at commit logs, I think this is your dept :-) For apps which re/index files frequently and need format conversion, I'd like to propose a patch for one of... Omindex library (thread safe): Omindex::init(options) // struct Omindex::options { ... } initialize mime_map, store default options session = new Omindex::Session(db_pathname) user threads use different sessions

htmlParse hangs or crashes

2011 Sep 05

htmlParse hangs or crashes

Dear colleagues, each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message. The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the

omindex one file at a time?

2012 Dec 13

omindex one file at a time?

Hi, all -- I want to do Plain Old Omindex'ing *but* the mapping between my documents' filenames and the URLs where I hope search users to find them is, uh..., strange. The simplest thing (to me) would be to run omindex for each document, e.g. omindex --no-delete -U /cool-url-1 /funky/doc/file-blah.pdf omindex --no-delete -U /cool-url-7 /doc/funky/ohmy/blah-file.txt ... and so on...

omega: omindex behaviour with duplicate files

2007 Jul 12

omega: omindex behaviour with duplicate files

Hi all I need a little clarification with regard to Omega's behaviour with 'duplicate' files when running 'omindex'. How is a duplicate recognised? Is it simply by file path? How is an unmodified file detected, if at all? I would like to set up subversion post-commit hook to update my index. If possible I would like to just update the index with the newly commited files.

[GSOC 2014] Indexing INEX dataset

2014 Mar 11

[GSOC 2014] Indexing INEX dataset

Hi Parth, I?ve implemented SVMRanker class and also sorted out most of current Letor APIs. Now I?m trying to use INEX dataset to verify my implement. But I stuck in the indexing part. You said in the documentation that we have to add prefix when indexing. Also I notice that you set some metadata in omindex.cc of your version. But the omindex.cc has changed since 2011. I think that?s why my result

parse an HTML page with verbose error message (using XML)

2010 Mar 11

parse an HTML page with verbose error message (using XML)

I'm using the function htmlParse() in the XML package, and I need a little bit help on error handling while parsing an HTML page. So far I can use either the default way: # error = xmlErrorCumulator(), by default library(XML) doc = htmlParse("http://www.public.iastate.edu/~pdixon/stat500/") # the error message is: # htmlParseStartTag: invalid element name or the tryCatch()

omindex hangs while scanning

2009 Jun 20

omindex hangs while scanning

Hello, I was looking for a search engine for a small internal documentation site and found xapian and omega. Downloaded and compiled it using msys and ming on a german windows xp system. Finally installed apache on the same box. Following the omega example I copied the book to .../apache/htdocs and startet the omindex which hang up on the first document found. Even on very short doc with

Try Giving Invalid Argument Type Error

2012 May 19

Try Giving Invalid Argument Type Error

Dear R Helpers, I am getting an error message from the try function that I don't understand so I am hoping that someone can help. I am scraping from web pages, but sometimes they disappear. When that happens I need to control for it with some sort of function. This web page is parsed without a problem. exh<-"NASDAQ" tic<-"EGHT"

How to pass parameters to htmlParse Bank of Canada html pages

2009 Jun 30

How to pass parameters to htmlParse Bank of Canada html pages

To get USDCAD rates from Bank of Canada, we first go url <- "http://banqueducanada.ca/en/rates/exchange-avg.html" select 12 months for Rates for the past and click "Get Rates" button. Then the page moves to address <- "http://banqueducanada.ca/cgi-bin/famecgi_fdps" and the rates show in the html page. htmlParse() can read the html document but

Question about the ticket #743 omindex: delay libmagic checks

2017 Apr 20

Question about the ticket #743 omindex: delay libmagic checks

Hi, I'm working on the ticket #743 omindex: delay libmagic checks <https://trac.xapian.org/ticket/743>. As the ticket's Description mention, the call to libmagic is expensive than call the stat, so we can check the size by call the stat to get size before call libmagic to get a mime type. But how about the timestamps check? since timestamps check need to iterate the DB to check if

Ticket #282: omindex-assorted-enhancements.patch woes

2009 Feb 02

Ticket #282: omindex-assorted-enhancements.patch woes

I would really like to try out the features in the patch above. But I can't ever seem to get the resulting omindex.cc to "make". I tried updating to rev 10801 from the SVN then run /bootstrap but then I seem to get errors compiling everything when I try and do "make" (I'm using ubuntu 8.10). So I thought I'd try an apply the patch to the latest stable version

similar to: [Fwd: Irix install of omega fails.]