On Fri, Dec 22, 2006 at 12:31:08PM -0500, Alex Kushkuley wrote:> If I understood correctly, the Xapian JNI layer keeps the "C-shadows" > of all the org.xapian Java objects in hash maps untill objects are > finalized by Java garbage collection. The java finalize method for an > org.xapian object is overriden and among other things it calls native > method that removes the object's "C-shadow" from the hashmap.Eric's the real authority on this (since he wrote it) but that sounds about right to me.> This is done in a synchronized fashion (under pthread_lock). It seems > to be working fine, however when large application runs in a tight > loop it is slowed down by garbage collection.I'm not sure I see why this would slow things down (unless it's an SMP machine), but there is some overhead to this locking, and it complicates building on some platforms (we don't automatically pass any thread-specific flags to the compiler, but e.g. OSF/1 on alpha needs "-pthread" so the user has to know to force this currently).> Using incremental garbage collection (-Xincgc ) seems to fix this > problem, however: > > The spec. for java "finalize" method states that > > "The Java programming language does not guarantee which thread will > invoke the finalize method for any given object. It is guaranteed, > however, that the thread that invokes finalize will not be holding any > user-visible synchronization locks when finalize is invoked." > > This seems to imply that synchronizing inside garbage collectior is > not recommended.I think you're reading too much into the quoted text. To me it just says that the thread won't be holding any user-visible locks already, not that you can't take out locks. In fact, it seems to me it must be telling you this precisely so you know that it is safe to take out locks without worrying about existing locks the thread might already have.> I don't know if that can be helped, since access to stl hasn_map must > be synchronized. > On the other hand, I understand that reconstructing java objects on > "C-side" is probably not a good idea either. > > My first thought was to implement explicit "destroy" methods on java > objects in org.xapian so that it will be an invoker responsibility to > "destroy" an object when it's no longer needed > (much like object on a heap in C/C++).I'm not really a Java programmer, but unless such methods are common in other class libraries, I think this would result in an unnatural Java API. While WritableDatabase would benefit from having a close() method, it shouldn't be required to call it.> Can I ask for your opinion on that ?The way SWIG handles this is to store the pointer to the C++ class in the jlong directly, which avoids the need for using a map. It's also clearly faster than using a map, though I don't know if the speed difference would actually be measurable in real code. I think that would be a better approach. Or we could leave this until we move to using SWIG for the Java bindings, since then it won't be an issue. Cheers, Olly
On Dec 22, 2006, at 9:02 PM, Olly Betts wrote:> On Fri, Dec 22, 2006 at 12:31:08PM -0500, Alex Kushkuley wrote:>> This is done in a synchronized fashion (under pthread_lock). It >> seems >> to be working fine, however when large application runs in a tight >> loop it is slowed down by garbage collection.I've seen this too. The locking in general significantly impacts the performance of tight-looped things like adding terms to a document or fetching documents from a set. I never thought to attribute it to garbage collection (never profiled it), but I can see how that might be the case.>> Using incremental garbage collection (-Xincgc ) seems to fix this >> problem, however: >> >> The spec. for java "finalize" method states that >> >> "The Java programming language does not guarantee which thread will >> invoke the finalize method for any given object. It is guaranteed, >> however, that the thread that invokes finalize will not be holding >> any >> user-visible synchronization locks when finalize is invoked." >> >> This seems to imply that synchronizing inside garbage collectior is >> not recommended. > > I think you're reading too much into the quoted text. To me it just > says that the thread won't be holding any user-visible locks already, > not that you can't take out locks. In fact, it seems to me it must > be telling you this precisely so you know that it is safe to take out > locks without worrying about existing locks the thread might already > have.I agree with Olly. Further, Sun is obviously talking about Java- based locks via "synchronized". Java's gc has no visibility into what Xapian_jni is doing and shouldn't care. Possible locking bugs aside, I think Xapain_jni is perfectly thread safe, even during garbage collection.> I'm not really a Java programmer, but unless such methods are > common in > other class libraries, I think this would result in an unnatural > Java API. > While WritableDatabase would benefit from having a close() method, it > shouldn't be required to call it.Agreed. Unfortunately, background garbage collection is what Java provides.>> Can I ask for your opinion on that ? > > The way SWIG handles this is to store the pointer to the C++ class in > the jlong directly, which avoids the need for using a map.Yeah, way more elegant than using pthread locking!> I think that would be a better approach. Or we could leave this until > we move to using SWIG for the Java bindings, since then it won't be an > issue.Another thought is to ditch both Xapian_jni and SWIG for Java, and instead implement a pure-Java xapian-tcp client. I'm not familiar with that side of Xapian, but that really seems the way to go for good Java support. Let xapian listen on localhost (or another box) and Java can just communicate with it directly. Something like this would eliminate all of the above issues (and more!) and really fit into the way Java likes to do things. Alternatively, re-write Xapian in Java. ;) eric