Daniel P. Berrangé
2019-Apr-18 17:33 UTC
Re: [libvirt-users] [libvirt] JVM crashes during GC
On Thu, Apr 18, 2019 at 10:46:19PM +0530, Sachin Soman wrote:> I am attaching the execution results. At the top of each file I have > mentioned the environment details. > > Following is the test program I have used: > > =================================================> > *package* org.libvirt; > > > *import* org.libvirt.jna.Libvirt; > > > *public* *class* LibvirtCrashTest { > > *void* createAndDestroyDefaultAuthConnection() { > > ConnectAuth ca = *new* ConnectAuthDefault(); > > *try* { > > System.*out*.println("Starting new connection with default auth"); > > Connect connect = *new* Connect("esx://x.x.x.x/?no_verify=1", ca, 0);It could be interesting to try different libvirt drivers. eg "test:///default" this could help identify if its a bug in libvirt common code vs a bug in only the ESX driver code.> > Thread.*sleep*(1000); > > System.*out*.println("Explicit connection closure"); > > connect.close(); > > Thread.*sleep*(5000); > > } *catch* (Exception e) { > > e.printStackTrace(); > > } > > } > > > *public* *static* *void* main(String[] args) *throws* Exception { > > LibvirtCrashTest testInstance = *new* LibvirtCrashTest(); > > > *for*(*int* counter = 0; counter < 3; counter++) { > > testInstance.createAndDestroyDefaultAuthConnection(); > > System.*out*.println("gc'ing"); > > System.*gc*(); > > System.*out*.println("gc'd"); > > *int* tCounter = 0; > > *while*(tCounter++ < 20) { > > System.*out*.println("waiting.. " + tCounter); > > Thread.*sleep*(1000); > > } > > } > > System.*out*.println("Going down..."); > > } > > > } > =================================================> > > Thanks & Regards > Sachin Soman > > > > > On Thu, Apr 18, 2019 at 9:25 PM Daniel P. Berrangé <berrange@redhat.com> > wrote: > > > On Thu, Apr 18, 2019 at 05:51:06PM +0200, Michal Prívozník wrote: > > > On 4/17/19 10:24 AM, Sachin Soman wrote: > > > > Hi, > > > > > > > > Could you tell me if the following is some known issue? > > > > > > > > While performing the following simple test, I see my JVM crashing > > > > (consistently): > > > > 1. Open a connection to an ESXi driver/host (passing ConnectAuthDefault > > > > instance). > > > > 2. Close the connection. > > > > 3. Invoke GC > > > > > > > > When GC is triggered, at some point, some unallocated native memory is > > > > being tried to release. That's failing. > > > > > > > > The error thrown is: > > > > > > > > java(78745,0x70000241e000) malloc: *** error for object 0x7fd5df561390: > > > > pointer being freed was not allocated > > > > > > > > *** set a breakpoint in malloc_error_break to debug > > > > > > > > > > > > Frames from core dump: > > > > > > > > frame #0: 0x00007fff5b274b66 libsystem_kernel.dylib`__pthread_kill > > + 10 > > > > > > > > frame #1: 0x00007fff5b43f080 libsystem_pthread.dylib`pthread_kill > > + 333 > > > > > > > > frame #2: 0x00007fff5b1d01ae libsystem_c.dylib`abort + 127 > > > > > > > > frame #3: 0x00007fff5b2ce8a6 libsystem_malloc.dylib`free + 521 > > > > > > > > frame #4: 0x00000001127f43a7 > > > > > > > > frame #5: 0x00000001127e3ffd > > > > > > > > frame #6: 0x00000001127e3ffd > > > > > > > > frame #7: 0x00000001127e3ffd > > > > > > > > frame #8: 0x00000001127e3ffd > > > > > > > > frame #9: 0x00000001127e4042 > > > > > > > > frame #10: 0x00000001127e3ffd > > > > > > > > frame #11: 0x00000001127e3ffd > > > > > > > > frame #12: 0x00000001127dc4e7 > > > > > > > > frame #13: 0x000000010c0e235e > > > > libjvm.dylib`JavaCalls::call_helper(JavaValue*, methodHandle*, > > > > JavaCallArguments*, Thread*) + 1710 > > > > > > > > frame #14: 0x000000010c0e2b02 > > > > libjvm.dylib`JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, > > > > Symbol*, JavaCallArguments*, Thread*) + 356 > > > > > > > > frame #15: 0x000000010c0e2cae > > > > libjvm.dylib`JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, > > > > Symbol*, Symbol*, Thread*) + 74 > > > > > > > > frame #16: 0x000000010c1208ee > > libjvm.dylib`thread_entry(JavaThread*, > > > > Thread*) + 124 > > > > > > > > frame #17: 0x000000010c33e84d > > > > libjvm.dylib`JavaThread::thread_main_inner() + 155 > > > > > > > > frame #18: 0x000000010c33ff12 libjvm.dylib`JavaThread::run() + 448 > > > > > > > > frame #19: 0x000000010c26058a libjvm.dylib`java_start(Thread*) + > > 246 > > > > > > > > frame #20: 0x00007fff5b43c661 > > libsystem_pthread.dylib`_pthread_body + > > > > 340 > > > > > > > > frame #21: 0x00007fff5b43c50d > > libsystem_pthread.dylib`_pthread_start + > > > > 377 > > > > > > > > frame #22: 0x00007fff5b43bbf9 libsystem_pthread.dylib`thread_start > > + 13 > > > > > > > > > > > > I have installed Libvirt 5.2.0. > > > > Java bindings libvirt-java 0.5.1 > > > > JNA 4.0.0 > > > > Tested Java environments: Oracle Java 8 and OpenJDK 8 on MAC, OpenJDK > > 11 on > > > > Ubuntu 16 > > > > > > The backtrace does not suggest it's libvirt related, but I wouldn't be > > > surprised if our Java bindings mangled memory somewhere. They are > > > heavily unmaintained. > > > > It could just as easily be a memory corruption bug in the ESX libvirt > > driver, since that runs directly in the applicatin process as it is a > > stateless client side driver. > > > > We would probably need to have an small demo program that can reproduce > > the problem in an isolated fashion, in order to try to debug it, along > > with full libvirt debug logs. > > > > > > Regards, > > Daniel > > -- > > |: https://berrange.com -o- > > https://www.flickr.com/photos/dberrange :| > > |: https://libvirt.org -o- > > https://fstop138.berrange.com :| > > |: https://entangle-photo.org -o- > > https://www.instagram.com/dberrange :| > >Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
I have tried the same tests using the "test" driver, and that works perfectly; no errors seen. Thanks & Regards Sachin Soman On Thu, Apr 18, 2019 at 11:03 PM Daniel P. Berrangé <berrange@redhat.com> wrote:> On Thu, Apr 18, 2019 at 10:46:19PM +0530, Sachin Soman wrote: > > I am attaching the execution results. At the top of each file I have > > mentioned the environment details. > > > > Following is the test program I have used: > > > > =================================================> > > > *package* org.libvirt; > > > > > > *import* org.libvirt.jna.Libvirt; > > > > > > *public* *class* LibvirtCrashTest { > > > > *void* createAndDestroyDefaultAuthConnection() { > > > > ConnectAuth ca = *new* ConnectAuthDefault(); > > > > *try* { > > > > System.*out*.println("Starting new connection with default auth"); > > > > Connect connect = *new* Connect("esx://x.x.x.x/?no_verify=1", ca, 0); > > It could be interesting to try different libvirt drivers. > > eg "test:///default" > > this could help identify if its a bug in libvirt common code > vs a bug in only the ESX driver code. > > > > > Thread.*sleep*(1000); > > > > System.*out*.println("Explicit connection closure"); > > > > connect.close(); > > > > Thread.*sleep*(5000); > > > > } *catch* (Exception e) { > > > > e.printStackTrace(); > > > > } > > > > } > > > > > > *public* *static* *void* main(String[] args) *throws* Exception { > > > > LibvirtCrashTest testInstance = *new* LibvirtCrashTest(); > > > > > > *for*(*int* counter = 0; counter < 3; counter++) { > > > > testInstance.createAndDestroyDefaultAuthConnection(); > > > > System.*out*.println("gc'ing"); > > > > System.*gc*(); > > > > System.*out*.println("gc'd"); > > > > *int* tCounter = 0; > > > > *while*(tCounter++ < 20) { > > > > System.*out*.println("waiting.. " + tCounter); > > > > Thread.*sleep*(1000); > > > > } > > > > } > > > > System.*out*.println("Going down..."); > > > > } > > > > > > } > > =================================================> > > > > > Thanks & Regards > > Sachin Soman > > > > > > > > > > On Thu, Apr 18, 2019 at 9:25 PM Daniel P. Berrangé <berrange@redhat.com> > > wrote: > > > > > On Thu, Apr 18, 2019 at 05:51:06PM +0200, Michal Prívozník wrote: > > > > On 4/17/19 10:24 AM, Sachin Soman wrote: > > > > > Hi, > > > > > > > > > > Could you tell me if the following is some known issue? > > > > > > > > > > While performing the following simple test, I see my JVM crashing > > > > > (consistently): > > > > > 1. Open a connection to an ESXi driver/host (passing > ConnectAuthDefault > > > > > instance). > > > > > 2. Close the connection. > > > > > 3. Invoke GC > > > > > > > > > > When GC is triggered, at some point, some unallocated native > memory is > > > > > being tried to release. That's failing. > > > > > > > > > > The error thrown is: > > > > > > > > > > java(78745,0x70000241e000) malloc: *** error for object > 0x7fd5df561390: > > > > > pointer being freed was not allocated > > > > > > > > > > *** set a breakpoint in malloc_error_break to debug > > > > > > > > > > > > > > > Frames from core dump: > > > > > > > > > > frame #0: 0x00007fff5b274b66 > libsystem_kernel.dylib`__pthread_kill > > > + 10 > > > > > > > > > > frame #1: 0x00007fff5b43f080 > libsystem_pthread.dylib`pthread_kill > > > + 333 > > > > > > > > > > frame #2: 0x00007fff5b1d01ae libsystem_c.dylib`abort + 127 > > > > > > > > > > frame #3: 0x00007fff5b2ce8a6 libsystem_malloc.dylib`free + 521 > > > > > > > > > > frame #4: 0x00000001127f43a7 > > > > > > > > > > frame #5: 0x00000001127e3ffd > > > > > > > > > > frame #6: 0x00000001127e3ffd > > > > > > > > > > frame #7: 0x00000001127e3ffd > > > > > > > > > > frame #8: 0x00000001127e3ffd > > > > > > > > > > frame #9: 0x00000001127e4042 > > > > > > > > > > frame #10: 0x00000001127e3ffd > > > > > > > > > > frame #11: 0x00000001127e3ffd > > > > > > > > > > frame #12: 0x00000001127dc4e7 > > > > > > > > > > frame #13: 0x000000010c0e235e > > > > > libjvm.dylib`JavaCalls::call_helper(JavaValue*, methodHandle*, > > > > > JavaCallArguments*, Thread*) + 1710 > > > > > > > > > > frame #14: 0x000000010c0e2b02 > > > > > libjvm.dylib`JavaCalls::call_virtual(JavaValue*, KlassHandle, > Symbol*, > > > > > Symbol*, JavaCallArguments*, Thread*) + 356 > > > > > > > > > > frame #15: 0x000000010c0e2cae > > > > > libjvm.dylib`JavaCalls::call_virtual(JavaValue*, Handle, > KlassHandle, > > > > > Symbol*, Symbol*, Thread*) + 74 > > > > > > > > > > frame #16: 0x000000010c1208ee > > > libjvm.dylib`thread_entry(JavaThread*, > > > > > Thread*) + 124 > > > > > > > > > > frame #17: 0x000000010c33e84d > > > > > libjvm.dylib`JavaThread::thread_main_inner() + 155 > > > > > > > > > > frame #18: 0x000000010c33ff12 libjvm.dylib`JavaThread::run() + > 448 > > > > > > > > > > frame #19: 0x000000010c26058a libjvm.dylib`java_start(Thread*) > + > > > 246 > > > > > > > > > > frame #20: 0x00007fff5b43c661 > > > libsystem_pthread.dylib`_pthread_body + > > > > > 340 > > > > > > > > > > frame #21: 0x00007fff5b43c50d > > > libsystem_pthread.dylib`_pthread_start + > > > > > 377 > > > > > > > > > > frame #22: 0x00007fff5b43bbf9 > libsystem_pthread.dylib`thread_start > > > + 13 > > > > > > > > > > > > > > > I have installed Libvirt 5.2.0. > > > > > Java bindings libvirt-java 0.5.1 > > > > > JNA 4.0.0 > > > > > Tested Java environments: Oracle Java 8 and OpenJDK 8 on MAC, > OpenJDK > > > 11 on > > > > > Ubuntu 16 > > > > > > > > The backtrace does not suggest it's libvirt related, but I wouldn't > be > > > > surprised if our Java bindings mangled memory somewhere. They are > > > > heavily unmaintained. > > > > > > It could just as easily be a memory corruption bug in the ESX libvirt > > > driver, since that runs directly in the applicatin process as it is a > > > stateless client side driver. > > > > > > We would probably need to have an small demo program that can reproduce > > > the problem in an isolated fashion, in order to try to debug it, along > > > with full libvirt debug logs. > > > > > > > > > Regards, > > > Daniel > > > -- > > > |: https://berrange.com -o- > > > https://www.flickr.com/photos/dberrange :| > > > |: https://libvirt.org -o- > > > https://fstop138.berrange.com :| > > > |: https://entangle-photo.org -o- > > > https://www.instagram.com/dberrange :| > > > > > > > > > > > > Regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- > https://www.instagram.com/dberrange :| >
Did you get a chance to debug the issue? Thanks & Regards, Sachin Soman On Thu, Apr 18, 2019, 11:10 PM Sachin Soman <sachonline.soman@gmail.com> wrote:> I have tried the same tests using the "test" driver, and that works > perfectly; no errors seen. > > Thanks & Regards > Sachin Soman > > > > > On Thu, Apr 18, 2019 at 11:03 PM Daniel P. Berrangé <berrange@redhat.com> > wrote: > >> On Thu, Apr 18, 2019 at 10:46:19PM +0530, Sachin Soman wrote: >> > I am attaching the execution results. At the top of each file I have >> > mentioned the environment details. >> > >> > Following is the test program I have used: >> > >> > =================================================>> > >> > *package* org.libvirt; >> > >> > >> > *import* org.libvirt.jna.Libvirt; >> > >> > >> > *public* *class* LibvirtCrashTest { >> > >> > *void* createAndDestroyDefaultAuthConnection() { >> > >> > ConnectAuth ca = *new* ConnectAuthDefault(); >> > >> > *try* { >> > >> > System.*out*.println("Starting new connection with default auth"); >> > >> > Connect connect = *new* Connect("esx://x.x.x.x/?no_verify=1", ca, 0); >> >> It could be interesting to try different libvirt drivers. >> >> eg "test:///default" >> >> this could help identify if its a bug in libvirt common code >> vs a bug in only the ESX driver code. >> >> > >> > Thread.*sleep*(1000); >> > >> > System.*out*.println("Explicit connection closure"); >> > >> > connect.close(); >> > >> > Thread.*sleep*(5000); >> > >> > } *catch* (Exception e) { >> > >> > e.printStackTrace(); >> > >> > } >> > >> > } >> > >> > >> > *public* *static* *void* main(String[] args) *throws* Exception { >> > >> > LibvirtCrashTest testInstance = *new* LibvirtCrashTest(); >> > >> > >> > *for*(*int* counter = 0; counter < 3; counter++) { >> > >> > testInstance.createAndDestroyDefaultAuthConnection(); >> > >> > System.*out*.println("gc'ing"); >> > >> > System.*gc*(); >> > >> > System.*out*.println("gc'd"); >> > >> > *int* tCounter = 0; >> > >> > *while*(tCounter++ < 20) { >> > >> > System.*out*.println("waiting.. " + tCounter); >> > >> > Thread.*sleep*(1000); >> > >> > } >> > >> > } >> > >> > System.*out*.println("Going down..."); >> > >> > } >> > >> > >> > } >> > =================================================>> > >> > >> > Thanks & Regards >> > Sachin Soman >> > >> > >> > >> > >> > On Thu, Apr 18, 2019 at 9:25 PM Daniel P. Berrangé <berrange@redhat.com >> > >> > wrote: >> > >> > > On Thu, Apr 18, 2019 at 05:51:06PM +0200, Michal Prívozník wrote: >> > > > On 4/17/19 10:24 AM, Sachin Soman wrote: >> > > > > Hi, >> > > > > >> > > > > Could you tell me if the following is some known issue? >> > > > > >> > > > > While performing the following simple test, I see my JVM crashing >> > > > > (consistently): >> > > > > 1. Open a connection to an ESXi driver/host (passing >> ConnectAuthDefault >> > > > > instance). >> > > > > 2. Close the connection. >> > > > > 3. Invoke GC >> > > > > >> > > > > When GC is triggered, at some point, some unallocated native >> memory is >> > > > > being tried to release. That's failing. >> > > > > >> > > > > The error thrown is: >> > > > > >> > > > > java(78745,0x70000241e000) malloc: *** error for object >> 0x7fd5df561390: >> > > > > pointer being freed was not allocated >> > > > > >> > > > > *** set a breakpoint in malloc_error_break to debug >> > > > > >> > > > > >> > > > > Frames from core dump: >> > > > > >> > > > > frame #0: 0x00007fff5b274b66 >> libsystem_kernel.dylib`__pthread_kill >> > > + 10 >> > > > > >> > > > > frame #1: 0x00007fff5b43f080 >> libsystem_pthread.dylib`pthread_kill >> > > + 333 >> > > > > >> > > > > frame #2: 0x00007fff5b1d01ae libsystem_c.dylib`abort + 127 >> > > > > >> > > > > frame #3: 0x00007fff5b2ce8a6 libsystem_malloc.dylib`free + 521 >> > > > > >> > > > > frame #4: 0x00000001127f43a7 >> > > > > >> > > > > frame #5: 0x00000001127e3ffd >> > > > > >> > > > > frame #6: 0x00000001127e3ffd >> > > > > >> > > > > frame #7: 0x00000001127e3ffd >> > > > > >> > > > > frame #8: 0x00000001127e3ffd >> > > > > >> > > > > frame #9: 0x00000001127e4042 >> > > > > >> > > > > frame #10: 0x00000001127e3ffd >> > > > > >> > > > > frame #11: 0x00000001127e3ffd >> > > > > >> > > > > frame #12: 0x00000001127dc4e7 >> > > > > >> > > > > frame #13: 0x000000010c0e235e >> > > > > libjvm.dylib`JavaCalls::call_helper(JavaValue*, methodHandle*, >> > > > > JavaCallArguments*, Thread*) + 1710 >> > > > > >> > > > > frame #14: 0x000000010c0e2b02 >> > > > > libjvm.dylib`JavaCalls::call_virtual(JavaValue*, KlassHandle, >> Symbol*, >> > > > > Symbol*, JavaCallArguments*, Thread*) + 356 >> > > > > >> > > > > frame #15: 0x000000010c0e2cae >> > > > > libjvm.dylib`JavaCalls::call_virtual(JavaValue*, Handle, >> KlassHandle, >> > > > > Symbol*, Symbol*, Thread*) + 74 >> > > > > >> > > > > frame #16: 0x000000010c1208ee >> > > libjvm.dylib`thread_entry(JavaThread*, >> > > > > Thread*) + 124 >> > > > > >> > > > > frame #17: 0x000000010c33e84d >> > > > > libjvm.dylib`JavaThread::thread_main_inner() + 155 >> > > > > >> > > > > frame #18: 0x000000010c33ff12 libjvm.dylib`JavaThread::run() >> + 448 >> > > > > >> > > > > frame #19: 0x000000010c26058a >> libjvm.dylib`java_start(Thread*) + >> > > 246 >> > > > > >> > > > > frame #20: 0x00007fff5b43c661 >> > > libsystem_pthread.dylib`_pthread_body + >> > > > > 340 >> > > > > >> > > > > frame #21: 0x00007fff5b43c50d >> > > libsystem_pthread.dylib`_pthread_start + >> > > > > 377 >> > > > > >> > > > > frame #22: 0x00007fff5b43bbf9 >> libsystem_pthread.dylib`thread_start >> > > + 13 >> > > > > >> > > > > >> > > > > I have installed Libvirt 5.2.0. >> > > > > Java bindings libvirt-java 0.5.1 >> > > > > JNA 4.0.0 >> > > > > Tested Java environments: Oracle Java 8 and OpenJDK 8 on MAC, >> OpenJDK >> > > 11 on >> > > > > Ubuntu 16 >> > > > >> > > > The backtrace does not suggest it's libvirt related, but I wouldn't >> be >> > > > surprised if our Java bindings mangled memory somewhere. They are >> > > > heavily unmaintained. >> > > >> > > It could just as easily be a memory corruption bug in the ESX libvirt >> > > driver, since that runs directly in the applicatin process as it is a >> > > stateless client side driver. >> > > >> > > We would probably need to have an small demo program that can >> reproduce >> > > the problem in an isolated fashion, in order to try to debug it, along >> > > with full libvirt debug logs. >> > > >> > > >> > > Regards, >> > > Daniel >> > > -- >> > > |: https://berrange.com -o- >> > > https://www.flickr.com/photos/dberrange :| >> > > |: https://libvirt.org -o- >> > > https://fstop138.berrange.com :| >> > > |: https://entangle-photo.org -o- >> > > https://www.instagram.com/dberrange :| >> > > >> >> >> >> >> >> >> >> >> Regards, >> Daniel >> -- >> |: https://berrange.com -o- >> https://www.flickr.com/photos/dberrange :| >> |: https://libvirt.org -o- >> https://fstop138.berrange.com :| >> |: https://entangle-photo.org -o- >> https://www.instagram.com/dberrange :| >> >