Good morning all, We run a network of sites; they're separate entities, hosted on multiple servers, and they require separate indexing. Sites may be updated at any time, and we'd like to be able to incrementally update our databases. Our systems are php-based, so it was easy to start off by using sphider, but we seem to be on our way to outgrowing it. I'm working through the xapian docs and the mailing list archive, but clues are scattered across the years, and I'm interested in current thoughts about best practice. One option seems to be to crawl our sites with htdig, keeping all indexes on a master server. I'm setting up a system using htdig2omega this morning, but at first glance, it seems as if we'd lose the ability to do incremental updates this way. The other option would be to keep the search facilities with each site, and not use htdig or wget at all, and this seems like a better way to go if server resources allow it. I also wonder if flint is the predominant database format these days, or if there are reasons to stay with quartz. Thanks in advance for your input, Eric
On Mon, Sep 11, 2006 at 09:59:46AM -0700, Eric Theise wrote:> One option seems to be to crawl our sites with htdig, keeping all indexes on > a master server. I'm setting up a system using htdig2omega this morning, > but at first glance, it seems as if we'd lose the ability to do incremental > updates this way.Using htdig2omega probably doesn't lend itself to incremental updates especially well. Since you run all these sites, perhaps the simplest way to index centrally is to rsync the sites (over ssh probably) to a set of mirrored document trees on a server which handles the search. Just run rsync right before you try to do an incremental update (and if rsync reports no changes, you can shortcut and not run the update).> I also wonder if flint is the predominant database format these days, or if > there are reasons to stay with quartz.I'd probably recommend using the current incarnation of flint at this point. My current thoughts are to make the current state of flint the default backend for 1.0, and then continue development on a branched copy. The hardest part is what to name things... Cheers, Olly