Displaying 5 results from an estimated 5 matches for "htdig2omega".
2006 Mar 29
1
htdig with omega for multiple URLs (websites)
...ases to
allow search of integrated results.
If you still have around the script you said you wrote to use htdig as
crawler front-end for omega, I would be really interested to see it.
My htdig crawls single site. I need to learn how to crawl multiple sites
and merge results. Do you recall your htdig2omega script handling this
merging? Or you searched one htdig-crawled database? Or can I merge
using htdig and then search using omega?
Thanks for any insight which way to start looking.
Also if anyone on list has experience with using htdig to crawl multiple
websites, I would really appreciate insi...
2011 Apr 17
3
Report for http://trac.xapian.org/wiki/SupportedPlatforms
...mples/
#/usr/share/doc/xapian-omega/TODO.Debian Not found
#/usr/share/doc/xapian-omega/changelog.Debian.gz Not found
#/usr/share/doc/xapian-omega/changelog.gz Not found
#/usr/share/doc/xapian-omega/copyright Not found
cp -p dbi2omega $root/usr/share/doc/xapian-omega/examples/dbi2omega
cp -p htdig2omega $root/usr/share/doc/xapian-omega/examples/htdig2omega
cp -p htdig2omega.script
$root/usr/share/doc/xapian-omega/examples/htdig2omega.script
cp -p mbox2omega $root/usr/share/doc/xapian-omega/examples/mbox2omega
cp -p mbox2omega.script
$root/usr/share/doc/xapian-omega/examples/mbox2omega.script
mkdir...
2006 May 26
1
Unicode troubles
...1927
Now the QueryParser works as I wants it to do, and creates the terms
correctly. But sadly I can't find any documents. If I do this;
$ quest -d /var/lib/xapian r?serbil -> no results
$ query -d /var/lib/xapian r*serbil -> result
I'm indexing the pages from a htdig database using htdig2omega. I've
tried to parse the db.docs-file as generated by htdump or after it's
been converted to utf-8 by iconv. I've also tried to replace the p_*
functions in scriptindex.cc to U_ ones -- just like the first patch
does -- but I'm unable to get it to work.
Any ideas what I am doing wr...
2007 Feb 08
1
Getting custom field data from the page through crawling
...nd_date" content="2007-02-16" />
That won't work the best though, because htdig won't store that information in a
meaningful way to allow me to retrieve it in order to set the fields myself later. So,
the one workaround solution I could come up with was to maybe edit the htdig2omega
script, and for each doc read from db.docs, I then do an HTTP request on the URL, read
it, parse these tags and then print the fields, which will map to the settings I specify
in htdig2omega.script. But of course, I'm doing two page lookups when I spider the
site.. Once for the main htdig cra...
2006 Mar 17
1
omega crawler: ht://dig or wget?
At wiki page: http://wiki.xapian.org/Omega
I added a comment that ht://Dig looks like dead.
Does anybody really use it?
>From brief glance at docs I had a feeling it is not easy to configure.
Maybe better crawler is GNU wget? Mature, stable, maintained?
--
Peter Masiar