On Thu, Oct 14, 2010 at 06:51:28AM +0200, Jesper Krogh
wrote:> I'm struggling a bit with getting the Xapian indexes safely stored on
tape.
> LVM snapshots is not really an option given the performance penalties
> of that technology. And the index often encounter writes in the
> time it takes for the backup to copy off the files. Thus the
> index is potentially unsafe on tape.
There's some discussion here:
http://xapian.org/docs/admin_notes.html#backup-strategies
> I dont know much about Xapian's internal versioning of data, but
> I suspect it is something like PostgreSQL's MVCC system. Would
> so my theory was: If no ->flush "has ended" while the backup
was
> running, then the index would be safe, since the last thing to do
> is more or less to set the revision flag to the new version, which
> all have to co-exists with a fully functional version?
>
> Could we make a hook in Xapian so the backup could signal
> to the indexer that it is allowed to ->flush the index, but
> should hold of the "revision flag" until the backup signals that
> it is done.. Wouldn't that ensure a fully working copy all the time?
Yes, that should work (though it's better to think of it as
"commit" than
flush, and we've renamed the method now to reflect this).
The obvious approach is a second lock which is taken when actually
committing, which should be only for a short interval of time. The backup
would hold this lock (for quite a long time) to prevent changes being
committed.
More crudely, you could just open the database for writing while making
the backup, which would completely prevent other writers.
A rather different approach (which would allow for incremental backups)
would be to reuse the replication infrastructure and back up the files
with the incremental changes in. That reduces the backup time, but
the restore time would probably be longer. I guess you'd want a way
to perform the occasional full backup really anyway.
Or replicate and backup from the replica. You just need to stop the
replication while you make a backup. That doubles your disk space
requirements, but it does mean you also have a "hot" backup (unless
the replication replicates whatever broke the live database - if it's
accidentally deleting a slew of documents, it certainly would).
Cheers,
Olly