Weixian Zhou
2012-Apr-14 23:02 UTC
[Xapian-devel] [xapian] a bug fixed in brass_database.cc
Hi all, I fixed a bug in brass_database.cc. The bug is: *FIXME: this should be done by checking memory usage, not the number of* *changes. We could also look at the amount of data the inverter object* *currently holds.* I also modified the simpleindex.cc so that it now supports batch files indexing. -- Weixian Zhou Department of Computer Science and Engineering University at Buffalo, SUNY -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120414/689f53a9/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: brass_database.cc.patch Type: application/octet-stream Size: 3903 bytes Desc: not available URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120414/689f53a9/attachment-0004.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: simpleindex.cc Type: application/octet-stream Size: 4458 bytes Desc: not available URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120414/689f53a9/attachment-0005.obj>
On Sat, Apr 14, 2012 at 07:02:58PM -0400, Weixian Zhou wrote:> I fixed a bug in brass_database.cc.Thanks for your patches. One minor but important issue - your indentation doesn't match the existing code. Set your editor to indent by 4 spaces, with tabs displayed as 8 spaces, and indents of 8 or more spaces tab-filled. Full details are in HACKING. Also, in a patch, it's better to just remove old code rather than commenting it out. If people want to see the old code, they can look in the version control system. Looking at the flush threshold patch first, don't put your initials or name in comments you add (or at least not as a matter of course) - if people want to know who wrote a method, the version control system can tell them (e.g. git blame or svn blame). Updating copyright attribution is the big exception to that. It's better to make comments like that one in brass_inverter.h doxygen format comments so they appear in the generated documentation for internal APIs. Mostly that means use "/**" or "///" to start them, though there are various formatting rules too. There's no need to define a destructor if it's empty - you get an empty destructor by default. Inverter::get_inverter_size() is just an accessor method, so should really be const. The tracking of posting_count seems incorrect to me: * You increment it for each call to add_posting() (OK so far) * But you don't increment it at all for remove_posting(), which often also adds entries. * You only decrease it by own when a posting list is written out, not by the number of entries written. * The default threshold of 10000 is going to be too low now. It would also really be better (though much harder) to track the memory actually used, since that's what matters more to users. You don't remove the existing change_count variable, but just remove the code which updates it when documents are added or updated or removed, so it will always be zero (unless the user asks for allterms) which will cause various issues (like commit() not actually flushing changes). This causes a number of testcases to fail with your changes applied - try "make check" to see.> I also modified the simpleindex.cc so that it now supports batch files > indexing.Interesting, but we already have omindex for indexing a directory tree, while simpleindex is intended to be *simple*. So I'm not sure we want to change it to scan directories, particularly since the code to do that depends on the OS (we have a adaptor layer for this for the MSVC build, but examples should ideally not have to drag in such code as it makes them harder to understand). A list of text files on the command line might be a reasonable alternative (or perhaps better as a separate example). Cheers, Olly