search for: htdig2omega

Displaying 5 results from an estimated 5 matches for "htdig2omega".

2006 Mar 29
1
htdig with omega for multiple URLs (websites)
...ases to allow search of integrated results. If you still have around the script you said you wrote to use htdig as crawler front-end for omega, I would be really interested to see it. My htdig crawls single site. I need to learn how to crawl multiple sites and merge results. Do you recall your htdig2omega script handling this merging? Or you searched one htdig-crawled database? Or can I merge using htdig and then search using omega? Thanks for any insight which way to start looking. Also if anyone on list has experience with using htdig to crawl multiple websites, I would really appreciate insi...
2011 Apr 17
3
Report for http://trac.xapian.org/wiki/SupportedPlatforms
...mples/ #/usr/share/doc/xapian-omega/TODO.Debian Not found #/usr/share/doc/xapian-omega/changelog.Debian.gz Not found #/usr/share/doc/xapian-omega/changelog.gz Not found #/usr/share/doc/xapian-omega/copyright Not found cp -p dbi2omega $root/usr/share/doc/xapian-omega/examples/dbi2omega cp -p htdig2omega $root/usr/share/doc/xapian-omega/examples/htdig2omega cp -p htdig2omega.script $root/usr/share/doc/xapian-omega/examples/htdig2omega.script cp -p mbox2omega $root/usr/share/doc/xapian-omega/examples/mbox2omega cp -p mbox2omega.script $root/usr/share/doc/xapian-omega/examples/mbox2omega.script mkdir...
2006 May 26
1
Unicode troubles
...1927 Now the QueryParser works as I wants it to do, and creates the terms correctly. But sadly I can't find any documents. If I do this; $ quest -d /var/lib/xapian r?serbil -> no results $ query -d /var/lib/xapian r*serbil -> result I'm indexing the pages from a htdig database using htdig2omega. I've tried to parse the db.docs-file as generated by htdump or after it's been converted to utf-8 by iconv. I've also tried to replace the p_* functions in scriptindex.cc to U_ ones -- just like the first patch does -- but I'm unable to get it to work. Any ideas what I am doing wr...
2007 Feb 08
1
Getting custom field data from the page through crawling
...nd_date" content="2007-02-16" /> That won't work the best though, because htdig won't store that information in a meaningful way to allow me to retrieve it in order to set the fields myself later. So, the one workaround solution I could come up with was to maybe edit the htdig2omega script, and for each doc read from db.docs, I then do an HTTP request on the URL, read it, parse these tags and then print the fields, which will map to the settings I specify in htdig2omega.script. But of course, I'm doing two page lookups when I spider the site.. Once for the main htdig cra...
2006 Mar 17
1
omega crawler: ht://dig or wget?
At wiki page: http://wiki.xapian.org/Omega I added a comment that ht://Dig looks like dead. Does anybody really use it? >From brief glance at docs I had a feeling it is not easy to configure. Maybe better crawler is GNU wget? Mature, stable, maintained? -- Peter Masiar