On Fri, Nov 04, 2011 at 08:27:20AM +0100, james cauwelier
wrote:> I 'v got a database with around 7 000 000 products.  Running a query
> for the first time gives pretty slow results, e.g. 6 seconds for
> searching 'Harry Potter' (normal search, no phrase).  Running the
> exact same query returns in like 24ms or so.
> 
> I have read this to be because of the disk reading the first time and
> that for the second query results are in RAM cache?
> http://permalink.gmane.org/gmane.comp.search.xapian.general/8569
Yes.
> Is there no way to speed up the initial query?  What is cached, search
> results or some part of the index?  Is there a way to have a hot cache
> at all times for generic queries as opposed to very specific queries?
The caching we're talking about here is done by the OS - which will just
cache recently seen blocks from the files which make up the index.
Running a handful of common queries is enough to usefully warm up the
cache, as that will ensure that the upper branch blocks of the Btrees
are cached, which will mean that at worst only a few leaf blocks need
to be loaded for a typical search.
> Is the index not written to RAM cache upon indexing?  If it is, adding
> more RAM would be sufficient then?
The OS will probably cache data as it is written out to allow it to be
reread without hitting disk to do so, so more RAM will help if you're
losing cached data too soon due to lack of RAM.
If you're on Linux, there's usually a cron job to update the locate
database which runs each night - you might want to disable that as it
reads every directory on the disk which tends to flush any previously
cached data.  Disabling this means that "locate" won't work, which
is
usually not too much of an issue for a server.  There may be other
standard scheduled jobs with similar behaviour.
Cheers,
    Olly