On 19 Apr 2018, at 09:49, Miao LIU <miaoliu95 at acm.org> wrote:
> I have been using Xapian (https://xapian.org) for quite a few months as the
database and the searching engine in my IR system. I am currently facing
> a tough problem that I'd like to put all my posting lists in memory of
my high-end machine. I went through all related tutorials, documents and
> materials of the Xapian but I found nothing except a class called
"InMemoryDatabase" and a related issue
(https://trac.xapian.org/ticket/59#no1) without
> updating in recent years. As for the class "InMemoryDatabase", I
found pages of tutorials then tried and drew the conclusion that the
"InMemoryDatabase"
> unfortunately can not out-performs the general DiskDatabase mode.
Hi, Miao — InMemoryDatabase was never intended as a high-performance approach,
but rather for testing and certain more simple uses. If you want to improve
performance with a large amount of memory, then either a. rely on your operating
system, b. coerce your operating system into being more aggressive about memory
over disk access.
For a., you're basically relying on the virtual memory system of your OS.
You'll want to tune the kernel parameters (in many cases) to encourage it to
get more of the database into the kernel's file system buffers. (I'm not
up to date on how to do this, and in any case advice tends to need to be
somewhat specific to your setup, but I'm sure there are answers on one of
the Stack Exchange sites or similar that can point you in the right direction.)
For b., create a ram disk and put the Xapian database in there. If you really
need, you could probably use replication to ensure there's an on-disk
version lying around somewhere. You need to be able to get the entire database
in there, though. (Unless you aren't updating the database at the same time,
in which case I guess you could symlink from the — non-ramdisk — database
directory to the posting list table files.)
J
--
James Aylett
devfort.com — spacelog.org — tartarus.org/james/