Hey, I finally get to ask a question! One of the mildly irritating things about Ferret was that it was impossible to update the labels of a message without updating the entire entry, i.e. including the body. So updating the labels of a message and saving that to disk required either re-loading the body from the source, or keeping the body explicitly in the index so that it could be loaded without going back to the source. The latter approach is used by the current Ferret index implementation, since it''s significantly faster (especially for slow sources like IMAP servers), but at the cost of a lot of disk space. My understanding of Xapian is that this is also the case, since fields are essentially represented as prefixed terms, and so you''re basically updating a big blog, but I wanted to confirm this. I ask because the entries.db file is very big. :) -- William <wmorgan-sup at masanjin.net>
Excerpts from William Morgan''s message of Mon Jul 27 13:45:32 -0400 2009:> Hey, I finally get to ask a question! > > One of the mildly irritating things about Ferret was that it was > impossible to update the labels of a message without updating the entire > entry, i.e. including the body. So updating the labels of a message and > saving that to disk required either re-loading the body from the source, > or keeping the body explicitly in the index so that it could be loaded > without going back to the source. > > The latter approach is used by the current Ferret index implementation, > since it''s significantly faster (especially for slow sources like IMAP > servers), but at the cost of a lot of disk space. > > My understanding of Xapian is that this is also the case, since fields > are essentially represented as prefixed terms, and so you''re basically > updating a big blog, but I wanted to confirm this. I ask because the > entries.db file is very big. :)Xapian actually provides add_term and remove_term for documents. I''d definitely like to use these for label updates, but we need a way to tell if only the labels have changed in sync_message. Or, we update the index in Message#add_label/etc and get rid of the need to save buffers. That might not be an option for the Ferret index, though. We don''t store the body in entries.db, just enough info for thread-index-mode. It''s only about 800 bytes/message for me, but I don''t have snippets enabled so yours would be larger.
Reformatted excerpts from Rich Lane''s message of 2009-07-28:> Xapian actually provides add_term and remove_term for documents.Excellent.> I''d definitely like to use these for label updates, but we need a way > to tell if only the labels have changed in sync_message.I''ve been running into this same issue with my sup-server experiments, so I think we should split the API into, say, three separate calls: something like add_new_message, update_labels, and update_body. (AFAIK the only client of update_body is in some of the draft editing stuff.) WDYT?> Or, we update the index in Message#add_label/etc and get rid of the > need to save buffers. That might not be an option for the Ferret > index, though.I think that would actually be fine for Ferret, and it''s a direction that''s often been discussed. (Especially now that we have undo.) If we do the above, we can certainly do this as a later step.> We don''t store the body in entries.db, just enough info for > thread-index-mode. It''s only about 800 bytes/message for me, but I > don''t have snippets enabled so yours would be larger.On second glance, it''s a little smaller than I remembered. For my sample 212m mbox, it''s about 20m with snippets enabled. The total index size under Xapian (the xapian/ dir and all gdbm files) is larger than the original mbox file, which seems a little insane. But hey, disk space is cheap. -- William <wmorgan-sup at masanjin.net>
I tried out using add_term/remove_term for immediate label changes. It''s significantly faster than sync_message, but it still makes the interface feel laggy. There''s known room for improvement in Xapian''s replace_document. However, we''ll still have a lot of latency when we start using remote sup-servers, so I don''t think it''s a good idea to do these index operations synchronously with the UI. We could queue up index writes and execute them in a background thread. We''d want label additions to show up immediately in a search, though. This is easy to do for inbox-mode and label-view-mode, which covers most of my daily usage. If/when we support multiple clients connecting to a sup-server, we''ll need a way to notify them that someone else modified a message. We can implement a simple version of this now that notifies search-results-mode after the write completes. If we''re getting rid of buffer saving, it''d probably be easiest to use a weak-ref table so we keep at most 1 copy of each message in memory - this would make updating messages across buffers simpler. How is sup-server development going?
Reformatted excerpts from Rich Lane''s message of 2009-07-31:> I tried out using add_term/remove_term for immediate label changes. > It''s significantly faster than sync_message,Excellent.> but it still makes the interface feel laggy. There''s known room for > improvement in Xapian''s replace_document. However, we''ll still have a > lot of latency when we start using remote sup-servers, so I don''t > think it''s a good idea to do these index operations synchronously with > the UI.I agree, synchronous is not an option.> We could queue up index writes and execute them in a background > thread. We''d want label additions to show up immediately in a search, > though. This is easy to do for inbox-mode and label-view-mode, which > covers most of my daily usage.I''m fine with queuing up index writes and letting the user continue while they take effect in the background. I''m also fine with the easier option of just blocking during a search until the writes are complete.> If/when we support multiple clients connecting to a sup-server, we''ll > need a way to notify them that someone else modified a message.I think this is more of a nice-to-have than a necessity, but it would be nice to have, even if it was a "we''ve detected a change somewhere on the internet; reload? (y/n)"-kinda thing.> How is sup-server development going?Well. I have a simple version that stores "items" to files on disk, and uses Ferret to provide the search semantics. It''s modular enough that upgrading to Xapian shouldn''t be as painful as it was with Sup. There are even unit tests that enforce the semantics of the modules. Go me. I''m going to make a couple internal API changes in Sup and then try throwing the code together. -- William <wmorgan-sup at masanjin.net>