Robert Milkowski
2010-Mar-29 10:58 UTC
[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone
Hi, I have an Oracle 8.1.7.4 64bit running in a Solaris 9 branded zone on-top of Solaris 10. When I put a tablespace in a hotbackup mode it takes long time - usually more than 4s for each tablespace. Looks like ora_ckpt process is consuming 100% of cpu (in usr) while a tablespace is being put in the hotbackup mode. It looks like there might be an issue with semop() interposition in S9 branded zone. I can''t easily test it now but IIRC when running Oracle in a global zone there was no such issue. More details below. First, lets check a PID of ora_ckpt # ps -ef | grep 20326 oracle 20326 1 0 15:37:26 ? 15:52 ora_ckpt_I321 Now lets put a TEMP tablespace in a backup mode while running a dtrace script as well: SQL> alter tablespace TEMP begin backup; Tablespace altered. Elapsed: 00:00:04.65 # dtrace -n sched:::on-cpu''/pid==20326 && tid==1/{self->t=timestamp;}'' -n sched:::off-cpu''/self->t/{@u[ustack()]=sum(timestamp-self->t);@c[ustack()]=count();self->t=0;}'' -n tick-10s''{printa(@u);printa(@c)}'' dtrace: description ''sched:::on-cpu'' matched 1 probe dtrace: description ''sched:::off-cpu'' matched 2 probes dtrace: description ''tick-10s'' matched 1 probe ^C `_s9brand_systemcall+0x30 `s9_semsys+0x16c `s9_handler+0x84 oracle`sslssalck+0x514 libc.so.1`semop+0x18 oracle`sskgpwwait+0x13c oracle`ksliwat+0x44c oracle`kslwaitns+0x20 oracle`kskthbwt+0x174 oracle`kslwait+0x50 oracle`ksarcv+0x1bc oracle`ksbabs+0x344 oracle`ksbrdp+0x2ec oracle`opirip+0x31c oracle`opidrv+0x6a0 oracle`sou2o+0x10 oracle`main+0x128 oracle`_start+0x17c oracle`_start+0x18 3852868427 oracle`_s9brand_systemcall+0x30 oracle`s9_semsys+0x16c oracle`s9_handler+0x84 oracle`sslssalck+0x514 libc.so.1`semop+0x18 oracle`sskgpwwait+0x13c oracle`ksliwat+0x44c oracle`kslwaitns+0x20 oracle`kskthbwt+0x174 oracle`kslwait+0x50 oracle`ksarcv+0x1bc oracle`ksbabs+0x344 oracle`ksbrdp+0x2ec oracle`opirip+0x31c oracle`opidrv+0x6a0 oracle`sou2o+0x10 oracle`main+0x128 oracle`_start+0x17c oracle`_start+0x18 12 So the same ustack() consumed almost 4s of a CPU time out of 4.6s of a total time of the operation. Has anyone been hit by it? Also the 2nd stack looks strange (top 5 lines)... -- Robert Milkowski http://milek.blogspot.com -- This message posted from opensolaris.org
Edward Pilatowicz
2010-Mar-29 23:22 UTC
[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone
hey robert, so i don''t agree with your interpretation of the dtrace data below. the data doesn''t show a long time spent executing in semop(). instead it shows that the most common way that a database thread blocks after doing some processing is by calling semop(). ie, your script takes a timestamp when a thread starts executing (probably when returning from a semop() operation). then when the thread stops executing, you look at the elapsed time and then you''re attributing that execution time to the function where it stopped execution. that doesn''t really seeem right. i''ll bet that the majority of execution time was spent somewhere else, not in semop(). semop() is just the place where the thread is getting swapped off proc. ed On Mon, Mar 29, 2010 at 03:58:13AM -0700, Robert Milkowski wrote:> Hi, > > I have an Oracle 8.1.7.4 64bit running in a Solaris 9 branded zone on-top of Solaris 10. > When I put a tablespace in a hotbackup mode it takes long time - usually more than 4s for each tablespace. Looks like ora_ckpt process is consuming 100% of cpu (in usr) while a tablespace is being put in the hotbackup mode. > > It looks like there might be an issue with semop() interposition in S9 branded zone. > I can''t easily test it now but IIRC when running Oracle in a global zone there was no such issue. > > More details below. > > First, lets check a PID of ora_ckpt > > # ps -ef | grep 20326 > oracle 20326 1 0 15:37:26 ? 15:52 ora_ckpt_I321 > > Now lets put a TEMP tablespace in a backup mode while running a dtrace script as well: > > SQL> alter tablespace TEMP begin backup; > > Tablespace altered. > > Elapsed: 00:00:04.65 > > > > # dtrace -n sched:::on-cpu''/pid==20326 && tid==1/{self->t=timestamp;}'' -n sched:::off-cpu''/self->t/{@u[ustack()]=sum(timestamp-self->t);@c[ustack()]=count();self->t=0;}'' -n tick-10s''{printa(@u);printa(@c)}'' > dtrace: description ''sched:::on-cpu'' matched 1 probe > dtrace: description ''sched:::off-cpu'' matched 2 probes > dtrace: description ''tick-10s'' matched 1 probe > ^C > > > `_s9brand_systemcall+0x30 > `s9_semsys+0x16c > `s9_handler+0x84 > oracle`sslssalck+0x514 > libc.so.1`semop+0x18 > oracle`sskgpwwait+0x13c > oracle`ksliwat+0x44c > oracle`kslwaitns+0x20 > oracle`kskthbwt+0x174 > oracle`kslwait+0x50 > oracle`ksarcv+0x1bc > oracle`ksbabs+0x344 > oracle`ksbrdp+0x2ec > oracle`opirip+0x31c > oracle`opidrv+0x6a0 > oracle`sou2o+0x10 > oracle`main+0x128 > oracle`_start+0x17c > oracle`_start+0x18 > 3852868427 > > oracle`_s9brand_systemcall+0x30 > oracle`s9_semsys+0x16c > oracle`s9_handler+0x84 > oracle`sslssalck+0x514 > libc.so.1`semop+0x18 > oracle`sskgpwwait+0x13c > oracle`ksliwat+0x44c > oracle`kslwaitns+0x20 > oracle`kskthbwt+0x174 > oracle`kslwait+0x50 > oracle`ksarcv+0x1bc > oracle`ksbabs+0x344 > oracle`ksbrdp+0x2ec > oracle`opirip+0x31c > oracle`opidrv+0x6a0 > oracle`sou2o+0x10 > oracle`main+0x128 > oracle`_start+0x17c > oracle`_start+0x18 > 12 > > So the same ustack() consumed almost 4s of a CPU time out of 4.6s of a total time of the operation. > > Has anyone been hit by it? > > Also the 2nd stack looks strange (top 5 lines)... > > -- > Robert Milkowski > http://milek.blogspot.com > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
Robert Milkowski
2010-Mar-30 14:39 UTC
[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone
you might be right and you probably are that''s why I want to use PID provider but unfortunately there is some bad interaction between PID provided and Oracle when running in a S9 zone which causes Oracle to crash. Adam is looking into it. I got another window I can look at it tomorrow and I will try to use profile provider. I will also try to run the same Oracle directly under Solaris 10 global zone and see if it behaves the same way or not.
Edward Pilatowicz
2010-Mar-30 17:23 UTC
[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone
oops. my recommendation to use the pid provider was misguided since, as one of my co-workers pointed out, currently the pid provider and plockstat are not supported on s9 (or s8) branded zones, as documented here: http://docs.sun.com/app/docs/doc/820-4490/gfjbk?a=view ed On Tue, Mar 30, 2010 at 03:39:01PM +0100, Robert Milkowski wrote:> > you might be right and you probably are that''s why I want to use PID > provider but unfortunately there is some bad interaction between PID > provided and Oracle when running in a S9 zone which causes Oracle to > crash. Adam is looking into it. > > I got another window I can look at it tomorrow and I will try to use > profile provider. > I will also try to run the same Oracle directly under Solaris 10 > global zone and see if it behaves the same way or not. > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Adam Leventhal
2010-Mar-30 17:54 UTC
[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone
Hey Ed, What''s the failure mode of the pid provider in a branded zone? I imagine it could be quite nasty. Adam On Mar 30, 2010, at 10:23 AM, Edward Pilatowicz wrote:> oops. my recommendation to use the pid provider was misguided since, as > one of my co-workers pointed out, currently the pid provider and > plockstat are not supported on s9 (or s8) branded zones, as documented > here: > > http://docs.sun.com/app/docs/doc/820-4490/gfjbk?a=view > > ed > > On Tue, Mar 30, 2010 at 03:39:01PM +0100, Robert Milkowski wrote: >> >> you might be right and you probably are that''s why I want to use PID >> provider but unfortunately there is some bad interaction between PID >> provided and Oracle when running in a S9 zone which causes Oracle to >> crash. Adam is looking into it. >> >> I got another window I can look at it tomorrow and I will try to use >> profile provider. >> I will also try to run the same Oracle directly under Solaris 10 >> global zone and see if it behaves the same way or not. >> >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Fishworks http://blogs.sun.com/ahl
Edward Pilatowicz
2010-Mar-30 17:58 UTC
[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone
i haven''t tried this myself, but from roberts description it sounds like it causes the application you attempt to trace to crash. ed On Tue, Mar 30, 2010 at 10:54:27AM -0700, Adam Leventhal wrote:> Hey Ed, > > What''s the failure mode of the pid provider in a branded zone? I imagine it could be quite nasty. > > Adam > > On Mar 30, 2010, at 10:23 AM, Edward Pilatowicz wrote: > > > oops. my recommendation to use the pid provider was misguided since, as > > one of my co-workers pointed out, currently the pid provider and > > plockstat are not supported on s9 (or s8) branded zones, as documented > > here: > > > > http://docs.sun.com/app/docs/doc/820-4490/gfjbk?a=view > > > > ed > > > > On Tue, Mar 30, 2010 at 03:39:01PM +0100, Robert Milkowski wrote: > >> > >> you might be right and you probably are that''s why I want to use PID > >> provider but unfortunately there is some bad interaction between PID > >> provided and Oracle when running in a S9 zone which causes Oracle to > >> crash. Adam is looking into it. > >> > >> I got another window I can look at it tomorrow and I will try to use > >> profile provider. > >> I will also try to run the same Oracle directly under Solaris 10 > >> global zone and see if it behaves the same way or not. > >> > >> _______________________________________________ > >> dtrace-discuss mailing list > >> dtrace-discuss at opensolaris.org > > _______________________________________________ > > dtrace-discuss mailing list > > dtrace-discuss at opensolaris.org > > > -- > Adam Leventhal, Fishworks http://blogs.sun.com/ahl
Robert Milkowski
2010-Mar-31 08:18 UTC
[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone
On 30/03/2010 18:58, Edward Pilatowicz wrote:> i haven''t tried this myself, but from roberts description it sounds like > it causes the application you attempt to trace to crash.Oracle crashes with messages like: Fri Mar 26 12:23:27 2010 Errors in file /i321/ua01/app/oracle/admin/I321/bdump/i321_ckpt_24394.trc: ORA-07445: exception encountered: core dump [0000000000000040] [SIGSEGV] [Address not mapped to object] [64] [] [] Fri Mar 26 12:23:27 2010 Errors in file /i321/ua01/app/oracle/admin/I321/bdump/i321_ckpt_24394.trc: ORA-07445: exception encountered: core dump [00000001008D617C] [SIGSEGV] [Address not mapped to object] [64] [] [] ORA-07445: exception encountered: core dump [0000000000000040] [SIGSEGV] [Address not mapped to object] [64] [] [] Fri Mar 26 12:23:28 2010 Errors in file /i321/ua01/app/oracle/admin/I321/bdump/i321_ckpt_24394.trc: ORA-07445: exception encountered: core dump [00000001008D617C] [SIGSEGV] [Address not mapped to object] [64] [] [] ORA-07445: exception encountered: core dump [00000001008D617C] [SIGSEGV] [Address not mapped to object] [64] [] [] ORA-07445: exception encountered: core dump [0000000000000040] [SIGSEGV] [Address not mapped to object] [64] [] [] Which casues a tablespace to be corrupted as well so it has to be recovered from backup. -- Robert Milkowski http://milek.blogspot.com
Robert Milkowski
2010-Mar-31 08:41 UTC
[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone
I used profile provider to look what ustack() was most common for ora_ckpt proccess (tid==1) when a tablespace is being put into a hotbackup mode and Edward was right - it''s not related to semop() or any interposing libraries in S9 zone - it were almost Oracle ustack()s only. So I cloned the database and started exactly the same binaries and data in a global zone. Now putting a tablespace in a backup mode takes well below a second while it takes about 4s for the instance running in a S9 zone... but there is one difference - much smaller buffer cache for oracle SGA running in the global zone (not enough memory to start both of them with the same memory allocations). Forutnatelly I could stop the other instance and start the one in the global zone with the same memory settings so now all setting are the same and now it takes about 4s for the one in a global zone as well. So it seems there is no issue with Oracle 8.1.7 interacting with S9 zone - rather it slows down considerably when large SGA/buffer cache is configured and a tablespace is put into a backup mode (and possibly other operations as I got dba''s complaining of slow performance) - I will experiment more and will probably tune the buffer down and rely on ZFS caching to make use of a memory. Thank you guys for looking into it. -- Robert Milkowski http://milek.blogspot.com
Robert Milkowski
2010-Mar-31 08:45 UTC
[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone
it should really be fixed (is there a bug open for it?). In the mean time the dtrace pid provider should reject instrumenting processes in a branded zone instead of potentially crashing an application as in my case. Merely putting a comment in a documentation that PID provider is not supported is not good enough imho.> oops. my recommendation to use the pid provider was misguided since, as > one of my co-workers pointed out, currently the pid provider and > plockstat are not supported on s9 (or s8) branded zones, as documented > here: > > http://docs.sun.com/app/docs/doc/820-4490/gfjbk?a=view > > >