Naveen Nalam
2007-Nov-08 02:28 UTC
[dtrace-discuss] dtrace on java binary causes system hang
Hi, This command causes my server to hang: dtrace -n ''pid$target::ServerClassMachine:return {printf( "%d\n", arg1) }'' -c /usr/bin/java I was running this command without any arguments to trace how Java was determining if it should run in client or server VM mode. The above command will hang the box either on the 1st execution, or I may have to run the command 4-5 times to get the server to hang. But I can always reproduce it. I thought perhaps it was bad hardware, so I tried on another server with the same specs and also get the hang. This however only hangs my storage servers, all my other server types (compute, web) are fine (which have different hardware config). Storage server specs: Asus P5M2 mobo Marvell MV8 controllers 6GB DDR2 DRAM Core2 Duo cpu Solaris S10 Update 4 with patch 125205-06 (sata fixes) There is no crash dump. I also tried booting the kernel with "-k", "set nopanicdebug=1", and "KEYBOARD_ABORT=alternate" -- but couldn''t get the kernel to break into the debugger. If anyone has some suggestions, I can give them a try. Thanks, Naveen -- This message posted from opensolaris.org
Adam Leventhal
2007-Nov-08 02:44 UTC
[dtrace-discuss] dtrace on java binary causes system hang
Hi Naveen, I''m unable to reproduce this on the latest OpenSolaris bits, but I''ll try to get it to happen on S10U4. This is obviously very strange and concerning. Do you hit this problem with every pid probe or is it just the ServerClassMachine function in /usr/bin/java? Out of curiosity, can you send me the disassembly for that function (echo ServerClassMachine::dis | mdb /usr/bin/java)? Thanks. Adam On Wed, Nov 07, 2007 at 06:28:17PM -0800, Naveen Nalam wrote:> Hi, > > This command causes my server to hang: > dtrace -n ''pid$target::ServerClassMachine:return {printf( "%d\n", arg1) }'' -c /usr/bin/java > > I was running this command without any arguments to trace how Java was determining if it should run in client or server VM mode. > > The above command will hang the box either on the 1st execution, or I may have to run the command 4-5 times to get the server to hang. But I can always reproduce it. I thought perhaps it was bad hardware, so I tried on another server with the same specs and also get the hang. This however only hangs my storage servers, all my other server types (compute, web) are fine (which have different hardware config). > > Storage server specs: > Asus P5M2 mobo > Marvell MV8 controllers > 6GB DDR2 DRAM > Core2 Duo cpu > Solaris S10 Update 4 with patch 125205-06 (sata fixes) > > There is no crash dump. I also tried booting the kernel with "-k", "set nopanicdebug=1", and "KEYBOARD_ABORT=alternate" -- but couldn''t get the kernel to break into the debugger. > > If anyone has some suggestions, I can give them a try. > > Thanks, > Naveen > > > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, FishWorks http://blogs.sun.com/ahl
Peter B. Kessler
2007-Nov-08 05:17 UTC
[dtrace-discuss] dtrace on java binary causes system hang
If you just want to know if a particular JRE runs as a client or a server, you can use, e.g., here on a machine that isn''t a server-class machine: $ java -version java version "1.7.0-ea" Java(TM) SE Runtime Environment (build 1.7.0-ea-b16) Java HotSpot(TM) Client VM (build 1.7.0-ea-b16, mixed mode, sharing) If you want to see why a particular JRE decides to run as a server, you can use the relatively undocumented _JAVA_LAUNCHER_DEBUG environment variable, e.g., here on a server-class machine: $ _JAVA_LAUNCHER_DEBUG=true java -version ----_JAVA_LAUNCHER_DEBUG---- .... pages: 262144 page_size: 8192 physical memory: 2147483648 (2.000GB) sysconf(_SC_NPROCESSORS_CONF): 2 unix_sparc_ServerClassMachine: JNI_TRUE ServerClassMachine: returns default value of true which implements the description in http://java.sun.com/j2se/1.5.0/docs/guide/vm/server-class.html Or if you want to look at the code that does that, start from https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/jdk/src/solaris/bin/ergo.c?rev=257&view=markup and find the machine-dependent ServerClassMachineImpl for Solaris/SPARC (because it''s easy to understand) at https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/jdk/src/solaris/bin/ergo_sparc.c?rev=257&view=markup The version of ServerClassMachineImpl for Solaris/i586 is in https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/jdk/src/solaris/bin/ergo_i586.c?rev=257&view=markup and is *much* more exciting. I''m pointing at the code for what will be JDK-7. If you are poking at an older JDK, the code isn''t arranged into files quite as nicely, but the functions are about the same. I don''t know why anything in there would cause DTrace to hang. It''s all just user-land code. There is some assembler in the i586 implementation. ... peter Naveen Nalam wrote:> Hi, > > This command causes my server to hang: > dtrace -n ''pid$target::ServerClassMachine:return {printf( "%d\n", arg1) }'' -c /usr/bin/java > > I was running this command without any arguments to trace how Java was determining if it should run in client or server VM mode. > > The above command will hang the box either on the 1st execution, or I may have to run the command 4-5 times to get the server to hang. But I can always reproduce it. I thought perhaps it was bad hardware, so I tried on another server with the same specs and also get the hang. This however only hangs my storage servers, all my other server types (compute, web) are fine (which have different hardware config). > > Storage server specs: > Asus P5M2 mobo > Marvell MV8 controllers > 6GB DDR2 DRAM > Core2 Duo cpu > Solaris S10 Update 4 with patch 125205-06 (sata fixes) > > There is no crash dump. I also tried booting the kernel with "-k", "set nopanicdebug=1", and "KEYBOARD_ABORT=alternate" -- but couldn''t get the kernel to break into the debugger. > > If anyone has some suggestions, I can give them a try. > > Thanks, > Naveen > > > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Peter B. Kessler
2007-Nov-08 06:22 UTC
[dtrace-discuss] dtrace on java binary causes system hang
Naveen Nalam wrote:> Hi Peter, > > I had started down this path because we have servers with Pentium D > 3.2GHz and 8GB RAM, but java was running with client VM by default. I > believe the Pentium D has two physical cores, which is why I was > curious about the client VM becoming default. > > I just ran with your secret ENV variable, and that returns: > vendor: G e n u i n e I n t e l > value_of_eax: 0xf62 value_of_edx: 0xbfebfbff > Hyperthreading supported > logical processors per package: 2 > physical processors: 1 > solaris_i386_ServerClassMachine: false > Default VM: clientWhich version of the JRE are you running? I think older versions of the logical processor code get confused on newer Intel processors. (How does one write code that''s forward compatible?) I think that got fixed in JDK-6 (or one of the updates to JDK-6), but I don''t have a 3.2GHz Pentium D and 8GB in my laptop, so I can''t test it right now. Can you try the most recent JDK-6 update 3 from http://java.sun.com/javase/downloads/?intcmp=1281 or the nearly final version of JDK-6 update 5 from http://download.java.net/jdk6/binaries/ or the bleeding edge JDK-7 from http://download.java.net/jdk7/binaries/ You can only use JDK-6 update 3 in production, but to figure out if we understand your processors, any of them will do.> When I started to investigate this, I tried to run it on some of our > Core2Duo ZFS servers to see what the default VM was - and found the > command hangs the storage box. > > I just ran the command again on a 3rd storage server to collect more > info for Adam, but it froze the box. This time I ran with the ustack > probe: > > [root at xxxx]# dtrace -n ''pid$target::ServerClassMachine:return > {ustack() }'' -c /usr/bin/java > dtrace: description ''pid$target::ServerClassMachine:return '' matched 1 probe > .... > dtrace: pid 3792 has exited > CPU ID FUNCTION:NAME > 1 41473 ServerClassMachine:return > java`ServerClassMachine+0x1c2 > java`CreateExecutionEnvironment+0x665 > java`main+0xc2 > java`_start+0x7a > > 1 41473 ServerClassMachine:return > java`ServerClassMachine+0x1c2 > libc.so.1`_thr_setup+0x4e > libc.so.1`_lwp_start > > 0 41472 ServerClassMachine:return > java`ServerClassMachine+0x1c2 > java`CreateExecutionEnvironment+0x665 > java`main+0xc2 > java`_start+0x7a > > --the server is hung at this point, doesn''t return to the bash prompt. > server not responsive to pings--Does "java -version" run on these storage servers, or does that hang too? Or do you need DTrace on it to cause the problem? Do you have any 3.2GHz Pentium D''s that aren''t in these storage servers? That is, is it the processor or the machine configuration that''s the problem. I''ll ask around and see if we have any of them in our labs. ... peter> -Naveen > > > On 11/7/07, Peter B. Kessler <Peter.Kessler at sun.com> wrote: >> If you just want to know if a particular JRE runs as a client or a >> server, you can use, e.g., here on a machine that isn''t a server-class >> machine: >> >> $ java -version >> java version "1.7.0-ea" >> Java(TM) SE Runtime Environment (build 1.7.0-ea-b16) >> Java HotSpot(TM) Client VM (build 1.7.0-ea-b16, mixed mode, sharing) >> >> If you want to see why a particular JRE decides to run as a server, >> you can use the relatively undocumented _JAVA_LAUNCHER_DEBUG environment >> variable, e.g., here on a server-class machine: >> >> $ _JAVA_LAUNCHER_DEBUG=true java -version >> ----_JAVA_LAUNCHER_DEBUG---- >> .... >> pages: 262144 page_size: 8192 physical memory: 2147483648 (2.000GB) >> sysconf(_SC_NPROCESSORS_CONF): 2 >> unix_sparc_ServerClassMachine: JNI_TRUE >> ServerClassMachine: returns default value of true >> >> which implements the description in >> >> http://java.sun.com/j2se/1.5.0/docs/guide/vm/server-class.html >> >> Or if you want to look at the code that does that, start from >> >> https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/jdk/src/solaris/bin/ergo.c?rev=257&view=markup >> >> and find the machine-dependent ServerClassMachineImpl for Solaris/SPARC >> (because it''s easy to understand) at >> >> https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/jdk/src/solaris/bin/ergo_sparc.c?rev=257&view=markup >> >> The version of ServerClassMachineImpl for Solaris/i586 is in >> >> https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/jdk/src/solaris/bin/ergo_i586.c?rev=257&view=markup >> >> and is *much* more exciting. >> >> I''m pointing at the code for what will be JDK-7. If you are poking at >> an older JDK, the code isn''t arranged into files quite as nicely, but >> the functions are about the same. >> >> I don''t know why anything in there would cause DTrace to hang. It''s all >> just user-land code. There is some assembler in the i586 implementation. >> >> ... peter >> >> Naveen Nalam wrote: >>> Hi, >>> >>> This command causes my server to hang: >>> dtrace -n ''pid$target::ServerClassMachine:return {printf( "%d\n", arg1) }'' -c /usr/bin/java >>> >>> I was running this command without any arguments to trace how Java was determining if it should run in client or server VM mode. >>> >>> The above command will hang the box either on the 1st execution, or I may have to run the command 4-5 times to get the server to hang. But I can always reproduce it. I thought perhaps it was bad hardware, so I tried on another server with the same specs and also get the hang. This however only hangs my storage servers, all my other server types (compute, web) are fine (which have different hardware config). >>> >>> Storage server specs: >>> Asus P5M2 mobo >>> Marvell MV8 controllers >>> 6GB DDR2 DRAM >>> Core2 Duo cpu >>> Solaris S10 Update 4 with patch 125205-06 (sata fixes) >>> >>> There is no crash dump. I also tried booting the kernel with "-k", "set nopanicdebug=1", and "KEYBOARD_ABORT=alternate" -- but couldn''t get the kernel to break into the debugger. >>> >>> If anyone has some suggestions, I can give them a try. >>> >>> Thanks, >>> Naveen >>> >>> >>> -- >>> This message posted from opensolaris.org >>> _______________________________________________ >>> dtrace-discuss mailing list >>> dtrace-discuss at opensolaris.org >>
Peter B. Kessler
2007-Nov-08 17:52 UTC
[dtrace-discuss] dtrace on java binary causes system hang
JDK-1.6.0_03-b05 (that''s JDK 6, update 3 build 5) has all the changes I know of for recent Intel chips. I''m going to hope that the DTrace folks can figure this one out. If they need help on the Java side of things I can help. ... peter Naveen Nalam wrote:>> Which version of the JRE are you running? I think older versions >> of the logical processor code get confused on newer Intel processors. > > This is what I''m running at the moment: > > Compute server: Pentium D 3.2GHz, 8GB RAM, no dtrace problems: > [root at computeX /]# /usr/bin/java -version > java version "1.6.0_03" > Java(TM) SE Runtime Environment (build 1.6.0_03-b05) > Java HotSpot(TM) Client VM (build 1.6.0_03-b05, mixed mode) > > Storage server: Core2Duo, 6GB RAM, has dtrace problems: > [root at storageX /]# /usr/bin/java -version > java version "1.6.0_03" > Java(TM) SE Runtime Environment (build 1.6.0_03-b05) > Java HotSpot(TM) Server VM (build 1.6.0_03-b05, mixed mode) > >> Does "java -version" run on these storage servers, or does that hang >> too? Or do you need DTrace on it to cause the problem? Do you have >> any 3.2GHz Pentium D''s that aren''t in these storage servers? That is, >> is it the processor or the machine configuration that''s the problem. >> I''ll ask around and see if we have any of them in our labs. > > "java -version" runs on both the compute and storage server without > any problems. things only hang on the storage server when running with > the dtrace. > > all my Pentium D boxes are in compute servers. all my storage servers > are Core2Duo based. I don''t have any Pentium D based storage servers > to test with. > > -Naveen