I am using lighttpd with two procs and occasionally the .lock file will not be properly removed by Ferret at which point my application will end up throwing nothing but 500 errors. Therefore, I have decided to go with a single writer thread... which is probably a better long term solution anyways. I would like some feedback on the best way to structure this. My app is hosted on TextDrive, so drb (distributed ruby) is not allowed. The only other solution I can come up with is to write all pending updates to a shared file. This could involve either: 1) serialize each object using something like YAML to a file and then deserializing them by the writer during updates. 2) just write the ids that need to be updated in the index and then read each object fresh from the database using its id when updating the index. I am leaning towards solution 2, as it is easier to implement, should be faster to write and read from the intermediate file and will be easier to remove duplicate index updates. The only drawback to 2 is it will require one additional database read for every index update... but this could be minimized by batch reading with a where id in (...). Also, both 1 and 2 will require a lockfile for managing concurrent access to the intermediate file. I am thinking of just using this lockfile library: http://raa.ruby-lang.org/project/lockfile/ Does anyone have any experience with this? Thanks, Tom
Tom Davies wrote:> 2) just write the ids that need to be updated in the index and then > read each object fresh from the database using its id when updating > the index. > > I am leaning towards solution 2, as it is easier to implement, should > be faster to write and read from the intermediate file and will be > easier to remove duplicate index updates. The only drawback to 2 is > it will require one additional database read for every index update... > but this could be minimized by batch reading with a where id in (...).Why not add a needs_indexing column to your object table? That way, not only do you not have to care about concurrent intermediate file access (because the DB takes care of that for you), but you can also do all your pending database reads at once, if that''s appropriate. If you''ve got a single writer thread, it can write the flag back either on all once it''s done, or on each as it goes. It seems much simpler all round to me... Of course, if you don''t want to change your object table schema, then you could create a separate table specifically for this. -- Alex
That is an excellent idea Alex. Not sure why I didn''t think of that :) Basically, your concept is like adding a dirty flag to my table. I like this approach much better. However, for my particular case, I will modify it slightly to just use the existing updated_at columns that I have for each of my models that need indexing. Then my index writer won''t have to lock the model database tables to reset the dirty flag. It will just keep track of the last time it updated the index. Thanks for finding a much simpler solution. That .lock file way was making me nervous :) Tom On 3/5/06, Alex Young <alex at blackkettle.org> wrote:> Tom Davies wrote: > > 2) just write the ids that need to be updated in the index and then > > read each object fresh from the database using its id when updating > > the index. > > > > I am leaning towards solution 2, as it is easier to implement, should > > be faster to write and read from the intermediate file and will be > > easier to remove duplicate index updates. The only drawback to 2 is > > it will require one additional database read for every index update... > > but this could be minimized by batch reading with a where id in (...). > Why not add a needs_indexing column to your object table? That way, not > only do you not have to care about concurrent intermediate file access > (because the DB takes care of that for you), but you can also do all > your pending database reads at once, if that''s appropriate. If you''ve > got a single writer thread, it can write the flag back either on all > once it''s done, or on each as it goes. It seems much simpler all round > to me... Of course, if you don''t want to change your object table > schema, then you could create a separate table specifically for this. > > -- > Alex > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
Tom Davies wrote:> Basically, your concept is like adding a dirty flag to my table.Pretty much - it''s dirty within a specific context.> I like this approach much better. However, for my particular case, I > will modify it slightly to just use the existing updated_at columns > that I have for each of my models that need indexing. Then my index > writer won''t have to lock the model database tables to reset the dirty > flag. It will just keep track of the last time it updated the index.Sounds good. Just remember to record the *start* of the write, not the end - otherwise you''ll get records being marked as updated while your write''s happening, and they''ll get missed by the next update.> Thanks for finding a much simpler solution. That .lock file way was > making me nervous :)No worries :-) -- Alex