thr3ads.net - dtrace discuss - [dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Robert Milkowski

2010-Mar-29 10:58 UTC

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

Hi,

I have an Oracle 8.1.7.4 64bit running in a Solaris 9 branded zone on-top of
Solaris 10.
When I put a tablespace in a hotbackup mode it takes long time - usually more
than 4s for each tablespace. Looks like ora_ckpt process is consuming 100% of
cpu (in usr) while a tablespace is being put in the hotbackup mode.

It looks like there might be an issue with semop() interposition in S9 branded
zone.
I can''t easily test it now but IIRC when running Oracle in a global
zone there was no such issue.

More details below.

First, lets check a PID of ora_ckpt

# ps -ef | grep 20326
  oracle 20326     1   0 15:37:26 ?          15:52 ora_ckpt_I321

Now lets put a TEMP tablespace in a backup mode while running a dtrace script as
well:

SQL> alter tablespace TEMP begin backup;

Tablespace altered.

Elapsed: 00:00:04.65



# dtrace -n sched:::on-cpu''/pid==20326 &&
tid==1/{self->t=timestamp;}'' -n
sched:::off-cpu''/self->t/{@u[ustack()]=sum(timestamp-self->t);@c[ustack()]=count();self->t=0;}''
-n tick-10s''{printa(@u);printa(@c)}''
dtrace: description ''sched:::on-cpu'' matched 1 probe
dtrace: description ''sched:::off-cpu'' matched 2 probes
dtrace: description ''tick-10s'' matched 1 probe
^C


              `_s9brand_systemcall+0x30
              `s9_semsys+0x16c
              `s9_handler+0x84
              oracle`sslssalck+0x514
              libc.so.1`semop+0x18
              oracle`sskgpwwait+0x13c
              oracle`ksliwat+0x44c
              oracle`kslwaitns+0x20
              oracle`kskthbwt+0x174
              oracle`kslwait+0x50
              oracle`ksarcv+0x1bc
              oracle`ksbabs+0x344
              oracle`ksbrdp+0x2ec
              oracle`opirip+0x31c
              oracle`opidrv+0x6a0
              oracle`sou2o+0x10
              oracle`main+0x128
              oracle`_start+0x17c
              oracle`_start+0x18
       3852868427

              oracle`_s9brand_systemcall+0x30
              oracle`s9_semsys+0x16c
              oracle`s9_handler+0x84
              oracle`sslssalck+0x514
              libc.so.1`semop+0x18
              oracle`sskgpwwait+0x13c
              oracle`ksliwat+0x44c
              oracle`kslwaitns+0x20
              oracle`kskthbwt+0x174
              oracle`kslwait+0x50
              oracle`ksarcv+0x1bc
              oracle`ksbabs+0x344
              oracle`ksbrdp+0x2ec
              oracle`opirip+0x31c
              oracle`opidrv+0x6a0
              oracle`sou2o+0x10
              oracle`main+0x128
              oracle`_start+0x17c
              oracle`_start+0x18
               12

So the same ustack() consumed almost 4s of a CPU time out of 4.6s of a total
time of the operation.

Has anyone been hit by it?

Also the 2nd stack looks strange (top 5 lines)...

-- 
Robert Milkowski
http://milek.blogspot.com
-- 
This message posted from opensolaris.org

Edward Pilatowicz

2010-Mar-29 23:22 UTC

head link

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

hey robert,

so i don''t agree with your interpretation of the dtrace data below. 
the
data doesn''t show a long time spent executing in semop().  instead it
shows that the most common way that a database thread blocks after doing
some processing is by calling semop().

ie, your script takes a timestamp when a thread starts executing
(probably when returning from a semop() operation).  then when the
thread stops executing, you look at the elapsed time and then you''re
attributing that execution time to the function where it stopped
execution.  that doesn''t really seeem right.  i''ll bet that
the majority
of execution time was spent somewhere else, not in semop().  semop() is
just the place where the thread is getting swapped off proc.

ed

On Mon, Mar 29, 2010 at 03:58:13AM -0700, Robert Milkowski
wrote:> Hi,
>
> I have an Oracle 8.1.7.4 64bit running in a Solaris 9 branded zone on-top
of Solaris 10.
> When I put a tablespace in a hotbackup mode it takes long time - usually
more than 4s for each tablespace. Looks like ora_ckpt process is consuming 100%
of cpu (in usr) while a tablespace is being put in the hotbackup mode.
>
> It looks like there might be an issue with semop() interposition in S9
branded zone.
> I can''t easily test it now but IIRC when running Oracle in a
global zone there was no such issue.
>
> More details below.
>
> First, lets check a PID of ora_ckpt
>
> # ps -ef | grep 20326
>   oracle 20326     1   0 15:37:26 ?          15:52 ora_ckpt_I321
>
> Now lets put a TEMP tablespace in a backup mode while running a dtrace
script as well:
>
> SQL> alter tablespace TEMP begin backup;
>
> Tablespace altered.
>
> Elapsed: 00:00:04.65
>
>
>
> # dtrace -n sched:::on-cpu''/pid==20326 &&
tid==1/{self->t=timestamp;}'' -n
sched:::off-cpu''/self->t/{@u[ustack()]=sum(timestamp-self->t);@c[ustack()]=count();self->t=0;}''
-n tick-10s''{printa(@u);printa(@c)}''
> dtrace: description ''sched:::on-cpu'' matched 1 probe
> dtrace: description ''sched:::off-cpu'' matched 2 probes
> dtrace: description ''tick-10s'' matched 1 probe
> ^C
>
>
>               `_s9brand_systemcall+0x30
>               `s9_semsys+0x16c
>               `s9_handler+0x84
>               oracle`sslssalck+0x514
>               libc.so.1`semop+0x18
>               oracle`sskgpwwait+0x13c
>               oracle`ksliwat+0x44c
>               oracle`kslwaitns+0x20
>               oracle`kskthbwt+0x174
>               oracle`kslwait+0x50
>               oracle`ksarcv+0x1bc
>               oracle`ksbabs+0x344
>               oracle`ksbrdp+0x2ec
>               oracle`opirip+0x31c
>               oracle`opidrv+0x6a0
>               oracle`sou2o+0x10
>               oracle`main+0x128
>               oracle`_start+0x17c
>               oracle`_start+0x18
>        3852868427
>
>               oracle`_s9brand_systemcall+0x30
>               oracle`s9_semsys+0x16c
>               oracle`s9_handler+0x84
>               oracle`sslssalck+0x514
>               libc.so.1`semop+0x18
>               oracle`sskgpwwait+0x13c
>               oracle`ksliwat+0x44c
>               oracle`kslwaitns+0x20
>               oracle`kskthbwt+0x174
>               oracle`kslwait+0x50
>               oracle`ksarcv+0x1bc
>               oracle`ksbabs+0x344
>               oracle`ksbrdp+0x2ec
>               oracle`opirip+0x31c
>               oracle`opidrv+0x6a0
>               oracle`sou2o+0x10
>               oracle`main+0x128
>               oracle`_start+0x17c
>               oracle`_start+0x18
>                12
>
> So the same ustack() consumed almost 4s of a CPU time out of 4.6s of a
total time of the operation.
>
> Has anyone been hit by it?
>
> Also the 2nd stack looks strange (top 5 lines)...
>
> --
> Robert Milkowski
> http://milek.blogspot.com
> --
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Robert Milkowski

2010-Mar-30 14:39 UTC

head link

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

you might be right and you probably are that''s why I want to use PID 
provider but unfortunately there is some bad interaction between PID 
provided and Oracle when running in a S9 zone which causes Oracle to 
crash. Adam is looking into it.

I got another window I can look at it tomorrow and I will try to use 
profile provider.
I will also try to run the same Oracle directly under Solaris 10 global 
zone and see if it behaves the same way or not.

Edward Pilatowicz

2010-Mar-30 17:23 UTC

head link

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

oops.  my recommendation to use the pid provider was misguided since, as
one of my co-workers pointed out, currently the pid provider and
plockstat are not supported on s9 (or s8) branded zones, as documented
here:

	http://docs.sun.com/app/docs/doc/820-4490/gfjbk?a=view

ed

On Tue, Mar 30, 2010 at 03:39:01PM +0100, Robert Milkowski
wrote:>
> you might be right and you probably are that''s why I want to use
PID
> provider but unfortunately there is some bad interaction between PID
> provided and Oracle when running in a S9 zone which causes Oracle to
> crash. Adam is looking into it.
>
> I got another window I can look at it tomorrow and I will try to use
> profile provider.
> I will also try to run the same Oracle directly under Solaris 10
> global zone and see if it behaves the same way or not.
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

Adam Leventhal

2010-Mar-30 17:54 UTC

head link

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

Hey Ed,

What''s the failure mode of the pid provider in a branded zone? I
imagine it could be quite nasty.

Adam

On Mar 30, 2010, at 10:23 AM, Edward Pilatowicz wrote:
> oops.  my recommendation to use the pid provider was misguided since, as
> one of my co-workers pointed out, currently the pid provider and
> plockstat are not supported on s9 (or s8) branded zones, as documented
> here:
> 
> 	http://docs.sun.com/app/docs/doc/820-4490/gfjbk?a=view
> 
> ed
> 
> On Tue, Mar 30, 2010 at 03:39:01PM +0100, Robert Milkowski wrote:
>> 
>> you might be right and you probably are that''s why I want to
use PID
>> provider but unfortunately there is some bad interaction between PID
>> provided and Oracle when running in a S9 zone which causes Oracle to
>> crash. Adam is looking into it.
>> 
>> I got another window I can look at it tomorrow and I will try to use
>> profile provider.
>> I will also try to run the same Oracle directly under Solaris 10
>> global zone and see if it behaves the same way or not.
>> 
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

--
Adam Leventhal, Fishworks                        http://blogs.sun.com/ahl

Edward Pilatowicz

2010-Mar-30 17:58 UTC

head link

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

i haven''t tried this myself, but from roberts description it sounds
like
it causes the application you attempt to trace to crash.

ed

On Tue, Mar 30, 2010 at 10:54:27AM -0700, Adam Leventhal
wrote:> Hey Ed,
>
> What''s the failure mode of the pid provider in a branded zone? I
imagine it could be quite nasty.
>
> Adam
>
> On Mar 30, 2010, at 10:23 AM, Edward Pilatowicz wrote:
>
> > oops.  my recommendation to use the pid provider was misguided since,
as
> > one of my co-workers pointed out, currently the pid provider and
> > plockstat are not supported on s9 (or s8) branded zones, as documented
> > here:
> >
> > 	http://docs.sun.com/app/docs/doc/820-4490/gfjbk?a=view
> >
> > ed
> >
> > On Tue, Mar 30, 2010 at 03:39:01PM +0100, Robert Milkowski wrote:
> >>
> >> you might be right and you probably are that''s why I want
to use PID
> >> provider but unfortunately there is some bad interaction between
PID
> >> provided and Oracle when running in a S9 zone which causes Oracle
to
> >> crash. Adam is looking into it.
> >>
> >> I got another window I can look at it tomorrow and I will try to
use
> >> profile provider.
> >> I will also try to run the same Oracle directly under Solaris 10
> >> global zone and see if it behaves the same way or not.
> >>
> >> _______________________________________________
> >> dtrace-discuss mailing list
> >> dtrace-discuss at opensolaris.org
> > _______________________________________________
> > dtrace-discuss mailing list
> > dtrace-discuss at opensolaris.org
>
>
> --
> Adam Leventhal, Fishworks                        http://blogs.sun.com/ahl

Robert Milkowski

2010-Mar-31 08:18 UTC

head link

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

On 30/03/2010 18:58, Edward Pilatowicz wrote:> i haven''t tried this myself, but from roberts description it
sounds like
> it causes the application you attempt to trace to crash.Oracle crashes with messages like:

Fri Mar 26 12:23:27 2010
Errors in file /i321/ua01/app/oracle/admin/I321/bdump/i321_ckpt_24394.trc:
ORA-07445: exception encountered: core dump [0000000000000040] [SIGSEGV] 
[Address not mapped to object] [64] [] []
Fri Mar 26 12:23:27 2010
Errors in file /i321/ua01/app/oracle/admin/I321/bdump/i321_ckpt_24394.trc:
ORA-07445: exception encountered: core dump [00000001008D617C] [SIGSEGV] 
[Address not mapped to object] [64] [] []
ORA-07445: exception encountered: core dump [0000000000000040] [SIGSEGV] 
[Address not mapped to object] [64] [] []
Fri Mar 26 12:23:28 2010
Errors in file /i321/ua01/app/oracle/admin/I321/bdump/i321_ckpt_24394.trc:
ORA-07445: exception encountered: core dump [00000001008D617C] [SIGSEGV] 
[Address not mapped to object] [64] [] []
ORA-07445: exception encountered: core dump [00000001008D617C] [SIGSEGV] 
[Address not mapped to object] [64] [] []
ORA-07445: exception encountered: core dump [0000000000000040] [SIGSEGV] 
[Address not mapped to object] [64] [] []

Which casues a tablespace to be corrupted as well so it has to be 
recovered from backup.


-- 
Robert Milkowski
http://milek.blogspot.com

Robert Milkowski

2010-Mar-31 08:41 UTC

head link

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

I used profile provider to look what ustack() was most common for 
ora_ckpt proccess (tid==1) when a tablespace is being put into a 
hotbackup mode and Edward was right - it''s not related to semop() or
any
interposing libraries in S9 zone - it were almost Oracle ustack()s only.

So I cloned the database and started exactly the same binaries and data 
in a global zone.
Now putting a tablespace in a backup mode takes well below a second 
while it takes about 4s for the instance running in a S9 zone... but 
there is one difference - much smaller buffer cache for oracle SGA 
running in the global zone (not enough memory to start both of them with 
the same memory allocations). Forutnatelly I could stop the other 
instance and start the one in the global zone with the same memory 
settings so now all setting are the same and now it takes about 4s for 
the one in a global zone as well.

So it seems there is no issue with Oracle 8.1.7 interacting with S9 zone 
- rather it slows down considerably when large SGA/buffer cache is 
configured and a tablespace is put into a backup mode (and possibly 
other operations as I got dba''s complaining of slow performance) - I 
will experiment more and will probably tune the buffer down and rely on 
ZFS caching to make use of a memory.

Thank you guys for looking into it.


-- 
Robert Milkowski
http://milek.blogspot.com

Robert Milkowski

2010-Mar-31 08:45 UTC

head link

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

it should really be fixed (is there a bug open for it?).
In the mean time the dtrace pid provider should reject instrumenting 
processes in a branded zone instead of potentially crashing an 
application as in my case. Merely putting a comment in a documentation 
that PID provider is not supported is not good enough imho.
> oops.  my recommendation to use the pid provider was misguided since, as
> one of my co-workers pointed out, currently the pid provider and
> plockstat are not supported on s9 (or s8) branded zones, as documented
> here:
>
> 	http://docs.sun.com/app/docs/doc/820-4490/gfjbk?a=view
>
>
>

dtrace discuss - Mar 2010 - libc:semop() consumes lots of CPU inside Solaris 9 branded zone

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone

[dtrace-discuss] libc:semop() consumes lots of CPU inside Solaris 9 branded zone