maz
2007-Jan-17 17:25 UTC
[Ferret-talk] Dump and load functionnalities? Test patch provided
Hello everyone, We need to create backups of our index, but there are a few constraints: - our application shouldn''t go offline for that - it has to be done quickly Ferret doesn''t seem to have this kind of functionnality (though I''m very new to Ferret, I may be wrong), and I figured that I couldn''t do it using plain Ruby (it''s way too slow, try with a 2000000+ documents index), so the only choice left was to incorporate its support into Ferret itself. I added this couple of features: - IndexReader#dump("file") Dump the whole index to a binary, non-portable file. - IndewWriter#load("file") Load this file, append it to the current index. I wrote a somewhat dirty patch for Ferret 0.10.14 (works with 0.10.13 too), you can find it here: http://pastie.caboo.se/33769 The fact that the dump file format is binary and home made doesn''t really matter to me, as long as it''s fast, but it''s probably not very safe either (about security checks in my code). Basically, the dump file format for one document is: <int - number of hash entries for document #0 (2)> <int - size of key> <char [] - key data ("id")> <int - size of value> <char [] - value data ("test")> <int - size of key> <char [] - key data ("foo")> <int - size of value> <char [] - value data ("bar")> <int - number of hash entries for document #1> ... - "int" being the C integer in the native endian and size, thus the file is only safely loadable one the same arch. - hash keys are converted from/to symbols during dump/load. - strings are stored without a ending \0. - sizes are in bytes. - of course, it''s all packed together, that''s binary. Now, about the feature itself, is there another, better way to do that? If not, could that find its place into Ferret (probably after some code cleaning, or even with a portable file format)? Also, I don''t really know what to do with locks or mutexes, I didn''t put any into my code and I couldn''t figure out how ferret did for thread safety. Any ideas? Thanks, -- Maz Rift Technologies - http://rift.fr/ -- Posted via http://www.ruby-forum.com/.
Sam Giffney
2007-Jan-18 01:41 UTC
[Ferret-talk] Dump and load functionnalities? Test patch provided
The easiest way would just be to copy/zip the directory that the index is in. maz wrote:> Hello everyone, > > We need to create backups of our index, but there are a few > constraints: > > - our application shouldn''t go offline for that > - it has to be done quicklySNIP> Maz > Rift Technologies - http://rift.fr/-- Posted via http://www.ruby-forum.com/.
maz
2007-Jan-18 10:35 UTC
[Ferret-talk] Dump and load functionnalities? Test patch provided
Sam Giffney wrote:> The easiest way would just be to copy/zip the directory that the index > is in.Isn''t there a risk of data loss? I mean, if Ferret is already using the index, I can''t just copy it like that because of locks and buffered data that *may* leave the index in a bad state. -- Maz Rift Technologies - http://rift.fr/ -- Posted via http://www.ruby-forum.com/.