przemolicc at poczta.fm
2006-Dec-14 13:35 UTC
[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again
Hello, we have 2-way (4-cores) Opteron server (x220M2): bash-3.00# uname -a SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc bash-3.00# psrinfo -v Status of virtual processor 0 as of: 12/14/2006 14:08:12 on-line since 12/13/2006 11:41:29. The i386 processor operates at 2613 MHz, and has an i387 compatible floating point processor. Status of virtual processor 1 as of: 12/14/2006 14:08:12 on-line since 12/13/2006 11:41:34. The i386 processor operates at 2613 MHz, and has an i387 compatible floating point processor. Status of virtual processor 2 as of: 12/14/2006 14:08:12 on-line since 12/13/2006 11:41:36. The i386 processor operates at 2613 MHz, and has an i387 compatible floating point processor. Status of virtual processor 3 as of: 12/14/2006 14:08:12 on-line since 12/13/2006 11:41:38. The i386 processor operates at 2613 MHz, and has an i387 compatible floating point processor. I was trying to catch syscalls and got unexpected message: bash-3.00# dtrace -n ''syscall::: { @[execname] = count (); }'' dtrace: description ''syscall::: '' matched 454 probes dtrace: processing aborted: Abort due to systemic unresponsiveness Because the server is not busy and is being prepared for a production I was surprised and did dtrace-ing again with vmstat-ing it. Below is output from vmstat: [...] 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 668 160 316 0 0 100 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 671 130 268 0 0 100 0 0 0 14546888 7392452 0 0 0 0 0 0 0 3 3 1 1 738 250 326 0 0 100 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 690 145 284 0 0 100 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 651 161 285 0 0 100 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 687 284 316 0 0 100 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 634 146 268 0 0 100 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 644 150 271 0 0 100 0 0 0 14546284 7391632 143 288 0 0 0 0 0 0 0 0 0 692 910 342 0 0 100 0 0 0 14459332 7312120 228 3332 0 0 0 0 0 0 0 0 0 884 2050 293 5 4 91 [1] 0 0 0 14459332 7312116 0 1 0 0 0 0 0 0 0 0 0 865 204 286 0 1 99 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr cd cd m2 m3 in sy cs us sy id 0 0 0 14459332 7312116 0 0 0 0 0 0 0 0 0 0 0 889 232 306 0 0 100 0 0 0 14459332 7312116 0 0 0 0 0 0 0 1 1 0 0 938 326 353 0 0 100 0 0 0 14546272 7391616 0 1 0 0 0 0 0 0 0 0 0 53402 1389 1109 4 3 93 [2] 0 0 0 14546272 7391616 148 362 0 0 0 0 0 1 1 1 1 1691 3243 1915 3 1 96 0 0 0 14545648 7390776 0 0 0 0 0 0 0 0 0 0 0 635 97 264 0 0 100 0 0 0 14545648 7390776 1 3 0 0 0 0 0 0 0 0 0 687 210 340 0 0 100 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 718 289 380 0 0 100 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 676 224 322 0 0 100 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 641 126 278 0 0 100 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 642 127 263 0 0 100 [...] The [1] is when I ran the dtrace and [2] is when I got the message ("unresponsiveness"). I have read the relevant topic: http://www.opensolaris.org/jive/thread.jspa?messageID=15073& and am aware that: - enabling destructive actions (-w) or - tuning below parameters for deadmen dtrace_deadman_user dtrace_deadman_interval dtrace_deadman_timeout can be helpfull. I agree that all of them are useful while server is really busy but I wouldn''t expect such behaviour on an idle server ! Is there any way to solve the problem without the tweaks ? I would like to get more knowledge about a nature of the problem. Regards przemol ---------------------------------------------------------------------- smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret http://link.interia.pl/f19d4 - najlepsze filmy w intermecie
Jon Haslam
2006-Dec-15 21:56 UTC
[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again
Hi, Yes, this looks to be an issue. I''ve had the same problem on some dual core M2 systems and so has a colleague. By the look of it, our high res timer gets severely out of whack every now and then which causes us to think that things have hung. I''ll let you know when I know more. Jon.>Hello, > >we have 2-way (4-cores) Opteron server (x220M2): > >bash-3.00# uname -a >SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc >bash-3.00# psrinfo -v >Status of virtual processor 0 as of: 12/14/2006 14:08:12 > on-line since 12/13/2006 11:41:29. > The i386 processor operates at 2613 MHz, > and has an i387 compatible floating point processor. >Status of virtual processor 1 as of: 12/14/2006 14:08:12 > on-line since 12/13/2006 11:41:34. > The i386 processor operates at 2613 MHz, > and has an i387 compatible floating point processor. >Status of virtual processor 2 as of: 12/14/2006 14:08:12 > on-line since 12/13/2006 11:41:36. > The i386 processor operates at 2613 MHz, > and has an i387 compatible floating point processor. >Status of virtual processor 3 as of: 12/14/2006 14:08:12 > on-line since 12/13/2006 11:41:38. > The i386 processor operates at 2613 MHz, > and has an i387 compatible floating point processor. > >I was trying to catch syscalls and got unexpected message: >bash-3.00# dtrace -n ''syscall::: { @[execname] = count (); }'' >dtrace: description ''syscall::: '' matched 454 probes >dtrace: processing aborted: Abort due to systemic unresponsiveness > >Because the server is not busy and is being prepared for a production >I was surprised and did dtrace-ing again with vmstat-ing it. Below is output from vmstat: > >[...] > > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 668 160 316 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 671 130 268 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 3 3 1 1 738 250 326 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 690 145 284 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 651 161 285 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 687 284 316 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 634 146 268 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 644 150 271 0 0 100 > 0 0 0 14546284 7391632 143 288 0 0 0 0 0 0 0 0 0 692 910 342 0 0 100 > 0 0 0 14459332 7312120 228 3332 0 0 0 0 0 0 0 0 0 884 2050 293 5 4 91 [1] > 0 0 0 14459332 7312116 0 1 0 0 0 0 0 0 0 0 0 865 204 286 0 1 99 > kthr memory page disk faults cpu > r b w swap free re mf pi po fr de sr cd cd m2 m3 in sy cs us sy id > 0 0 0 14459332 7312116 0 0 0 0 0 0 0 0 0 0 0 889 232 306 0 0 100 > 0 0 0 14459332 7312116 0 0 0 0 0 0 0 1 1 0 0 938 326 353 0 0 100 > 0 0 0 14546272 7391616 0 1 0 0 0 0 0 0 0 0 0 53402 1389 1109 4 3 93 [2] > 0 0 0 14546272 7391616 148 362 0 0 0 0 0 1 1 1 1 1691 3243 1915 3 1 96 > 0 0 0 14545648 7390776 0 0 0 0 0 0 0 0 0 0 0 635 97 264 0 0 100 > 0 0 0 14545648 7390776 1 3 0 0 0 0 0 0 0 0 0 687 210 340 0 0 100 > 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 718 289 380 0 0 100 > 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 676 224 322 0 0 100 > 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 641 126 278 0 0 100 > 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 642 127 263 0 0 100 >[...] > >The [1] is when I ran the dtrace and [2] is when I got the >message ("unresponsiveness"). > >I have read the relevant topic: >http://www.opensolaris.org/jive/thread.jspa?messageID=15073& >and am aware that: >- enabling destructive actions (-w) >or >- tuning below parameters for deadmen > dtrace_deadman_user > dtrace_deadman_interval > dtrace_deadman_timeout >can be helpfull. I agree that all of them are useful while server is really busy >but I wouldn''t expect such behaviour on an idle server ! > >Is there any way to solve the problem without the tweaks ? I would like to get >more knowledge about a nature of the problem. > >Regards >przemol > > >---------------------------------------------------------------------- >smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret >http://link.interia.pl/f19d4 - najlepsze filmy w intermecie > >_______________________________________________ >dtrace-discuss mailing list >dtrace-discuss at opensolaris.org > >
Dan Mick
2006-Dec-15 22:02 UTC
[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again
Jon Haslam wrote:> Hi, > > Yes, this looks to be an issue. I''ve had the same problem on some > dual core M2 systems and so has a colleague. > > By the look of it, our high res timer gets severely out of whack every > now and then which causes us to think that things have hung. I''ll > let you know when I know more.If that''s the problem, it might be fixed by 6342823 Unable to offline CPU 0 on x86 systems which is in snv_34, and supposed to be in Update 2. I don''t know if the system below is update 2 or not. (the problem was actually a concomitant fix to timestamp.c under that bug number, which probably should have had its own bug number; it only affects multi-cpu machines where the TSCs are very different at boot time, which is something that''s only started happening with the latest Intel Core2 and AMD RevE/RevF processor BIOSes, apparently...so it''s a latent bug exposed by newer hardware)> > Jon. > >> Hello, >> >> we have 2-way (4-cores) Opteron server (x220M2): >> >> bash-3.00# uname -a >> SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc >> bash-3.00# psrinfo -v >> Status of virtual processor 0 as of: 12/14/2006 14:08:12 >> on-line since 12/13/2006 11:41:29. >> The i386 processor operates at 2613 MHz, >> and has an i387 compatible floating point processor. >> Status of virtual processor 1 as of: 12/14/2006 14:08:12 >> on-line since 12/13/2006 11:41:34. >> The i386 processor operates at 2613 MHz, >> and has an i387 compatible floating point processor. >> Status of virtual processor 2 as of: 12/14/2006 14:08:12 >> on-line since 12/13/2006 11:41:36. >> The i386 processor operates at 2613 MHz, >> and has an i387 compatible floating point processor. >> Status of virtual processor 3 as of: 12/14/2006 14:08:12 >> on-line since 12/13/2006 11:41:38. >> The i386 processor operates at 2613 MHz, >> and has an i387 compatible floating point processor. >> >> I was trying to catch syscalls and got unexpected message: >> bash-3.00# dtrace -n ''syscall::: { @[execname] = count (); }'' >> dtrace: description ''syscall::: '' matched 454 probes >> dtrace: processing aborted: Abort due to systemic unresponsiveness >> >> Because the server is not busy and is being prepared for a production >> I was surprised and did dtrace-ing again with vmstat-ing it. Below is >> output from vmstat: >> >> [...] >> >> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 668 160 316 >> 0 0 100 >> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 671 130 268 >> 0 0 100 >> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 3 3 1 1 738 250 326 >> 0 0 100 >> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 690 145 284 >> 0 0 100 >> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 651 161 285 >> 0 0 100 >> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 687 284 316 >> 0 0 100 >> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 634 146 268 >> 0 0 100 >> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 644 150 271 >> 0 0 100 >> 0 0 0 14546284 7391632 143 288 0 0 0 0 0 0 0 0 0 692 910 342 >> 0 0 100 >> 0 0 0 14459332 7312120 228 3332 0 0 0 0 0 0 0 0 0 884 2050 293 >> 5 4 91 [1] >> 0 0 0 14459332 7312116 0 1 0 0 0 0 0 0 0 0 0 865 204 286 >> 0 1 99 >> kthr memory page disk faults cpu >> r b w swap free re mf pi po fr de sr cd cd m2 m3 in sy cs >> us sy id >> 0 0 0 14459332 7312116 0 0 0 0 0 0 0 0 0 0 0 889 232 306 >> 0 0 100 >> 0 0 0 14459332 7312116 0 0 0 0 0 0 0 1 1 0 0 938 326 353 >> 0 0 100 >> 0 0 0 14546272 7391616 0 1 0 0 0 0 0 0 0 0 0 53402 1389 1109 >> 4 3 93 [2] >> 0 0 0 14546272 7391616 148 362 0 0 0 0 0 1 1 1 1 1691 3243 1915 >> 3 1 96 >> 0 0 0 14545648 7390776 0 0 0 0 0 0 0 0 0 0 0 635 97 264 >> 0 0 100 >> 0 0 0 14545648 7390776 1 3 0 0 0 0 0 0 0 0 0 687 210 340 >> 0 0 100 >> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 718 289 380 >> 0 0 100 >> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 676 224 322 >> 0 0 100 >> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 641 126 278 >> 0 0 100 >> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 642 127 263 >> 0 0 100 >> [...] >> >> The [1] is when I ran the dtrace and [2] is when I got the >> message ("unresponsiveness"). >> >> I have read the relevant topic: >> http://www.opensolaris.org/jive/thread.jspa?messageID=15073& >> and am aware that: >> - enabling destructive actions (-w) >> or >> - tuning below parameters for deadmen >> dtrace_deadman_user >> dtrace_deadman_interval >> dtrace_deadman_timeout >> can be helpfull. I agree that all of them are useful while server is >> really busy >> but I wouldn''t expect such behaviour on an idle server ! >> >> Is there any way to solve the problem without the tweaks ? I would >> like to get >> more knowledge about a nature of the problem. >> >> Regards >> przemol >> >> >> ---------------------------------------------------------------------- >> smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret >> http://link.interia.pl/f19d4 - najlepsze filmy w intermecie >> >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org >> >> > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Jon Haslam
2006-Dec-15 22:08 UTC
[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again
>> >> By the look of it, our high res timer gets severely out of whack every >> now and then which causes us to think that things have hung. I''ll >> let you know when I know more. > > > If that''s the problem, it might be fixed by > > 6342823 Unable to offline CPU 0 on x86 systems > > which is in snv_34, and supposed to be in Update 2. I don''t know if > the system below is update 2 or not.Thanks Dan but this happens with the snv gate bits as well. I''ll contact you off-alias with my findings so far in case they ring any bells. Jon.> > (the problem was actually a concomitant fix to timestamp.c under that > bug number, which probably should have had its own bug number; it only > affects multi-cpu machines where the TSCs are very different at boot > time, which is something that''s only started happening with the latest > Intel Core2 and AMD RevE/RevF processor BIOSes, apparently...so it''s a > latent bug exposed by newer hardware) > > >> >> Jon. >> >>> Hello, >>> >>> we have 2-way (4-cores) Opteron server (x220M2): >>> >>> bash-3.00# uname -a >>> SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc >>> bash-3.00# psrinfo -v >>> Status of virtual processor 0 as of: 12/14/2006 14:08:12 >>> on-line since 12/13/2006 11:41:29. >>> The i386 processor operates at 2613 MHz, >>> and has an i387 compatible floating point processor. >>> Status of virtual processor 1 as of: 12/14/2006 14:08:12 >>> on-line since 12/13/2006 11:41:34. >>> The i386 processor operates at 2613 MHz, >>> and has an i387 compatible floating point processor. >>> Status of virtual processor 2 as of: 12/14/2006 14:08:12 >>> on-line since 12/13/2006 11:41:36. >>> The i386 processor operates at 2613 MHz, >>> and has an i387 compatible floating point processor. >>> Status of virtual processor 3 as of: 12/14/2006 14:08:12 >>> on-line since 12/13/2006 11:41:38. >>> The i386 processor operates at 2613 MHz, >>> and has an i387 compatible floating point processor. >>> >>> I was trying to catch syscalls and got unexpected message: >>> bash-3.00# dtrace -n ''syscall::: { @[execname] = count (); }'' >>> dtrace: description ''syscall::: '' matched 454 probes >>> dtrace: processing aborted: Abort due to systemic unresponsiveness >>> >>> Because the server is not busy and is being prepared for a production >>> I was surprised and did dtrace-ing again with vmstat-ing it. Below >>> is output from vmstat: >>> >>> [...] >>> >>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 668 160 >>> 316 0 0 100 >>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 671 130 >>> 268 0 0 100 >>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 3 3 1 1 738 250 >>> 326 0 0 100 >>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 690 145 >>> 284 0 0 100 >>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 651 161 >>> 285 0 0 100 >>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 687 284 >>> 316 0 0 100 >>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 634 146 >>> 268 0 0 100 >>> 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 644 150 >>> 271 0 0 100 >>> 0 0 0 14546284 7391632 143 288 0 0 0 0 0 0 0 0 0 692 910 >>> 342 0 0 100 >>> 0 0 0 14459332 7312120 228 3332 0 0 0 0 0 0 0 0 0 884 2050 >>> 293 5 4 91 [1] >>> 0 0 0 14459332 7312116 0 1 0 0 0 0 0 0 0 0 0 865 204 >>> 286 0 1 99 >>> kthr memory page disk >>> faults cpu >>> r b w swap free re mf pi po fr de sr cd cd m2 m3 in sy cs >>> us sy id >>> 0 0 0 14459332 7312116 0 0 0 0 0 0 0 0 0 0 0 889 232 >>> 306 0 0 100 >>> 0 0 0 14459332 7312116 0 0 0 0 0 0 0 1 1 0 0 938 326 >>> 353 0 0 100 >>> 0 0 0 14546272 7391616 0 1 0 0 0 0 0 0 0 0 0 53402 1389 >>> 1109 4 3 93 [2] >>> 0 0 0 14546272 7391616 148 362 0 0 0 0 0 1 1 1 1 1691 3243 >>> 1915 3 1 96 >>> 0 0 0 14545648 7390776 0 0 0 0 0 0 0 0 0 0 0 635 97 >>> 264 0 0 100 >>> 0 0 0 14545648 7390776 1 3 0 0 0 0 0 0 0 0 0 687 210 >>> 340 0 0 100 >>> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 718 289 >>> 380 0 0 100 >>> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 676 224 >>> 322 0 0 100 >>> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 641 126 >>> 278 0 0 100 >>> 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 642 127 >>> 263 0 0 100 >>> [...] >>> >>> The [1] is when I ran the dtrace and [2] is when I got the >>> message ("unresponsiveness"). >>> >>> I have read the relevant topic: >>> http://www.opensolaris.org/jive/thread.jspa?messageID=15073& >>> and am aware that: >>> - enabling destructive actions (-w) >>> or >>> - tuning below parameters for deadmen >>> dtrace_deadman_user >>> dtrace_deadman_interval >>> dtrace_deadman_timeout >>> can be helpfull. I agree that all of them are useful while server >>> is really busy >>> but I wouldn''t expect such behaviour on an idle server ! >>> >>> Is there any way to solve the problem without the tweaks ? I would >>> like to get >>> more knowledge about a nature of the problem. >>> >>> Regards >>> przemol >>> >>> >>> ---------------------------------------------------------------------- >>> smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret >>> http://link.interia.pl/f19d4 - najlepsze filmy w intermecie >>> >>> _______________________________________________ >>> dtrace-discuss mailing list >>> dtrace-discuss at opensolaris.org >>> >>> >> >> _______________________________________________ >> dtrace-discuss mailing list >> dtrace-discuss at opensolaris.org > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
przemolicc at poczta.fm
2006-Dec-18 08:06 UTC
[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again
On Fri, Dec 15, 2006 at 02:02:43PM -0800, Dan Mick wrote:> Jon Haslam wrote: > >Hi, > > > >Yes, this looks to be an issue. I''ve had the same problem on some > >dual core M2 systems and so has a colleague. > > > >By the look of it, our high res timer gets severely out of whack every > >now and then which causes us to think that things have hung. I''ll > >let you know when I know more. > > If that''s the problem, it might be fixed by > > 6342823 Unable to offline CPU 0 on x86 systems > > which is in snv_34, and supposed to be in Update 2. I don''t know if the > system below is update 2 or not.As Jon said, it is not fixed in update 2: host:> cat /etc/release Solaris 10 6/06 s10x_u2wos_09a X86 Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 09 June 2006 przemol ---------------------------------------------------------------------- smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret http://link.interia.pl/f19d4 - najlepsze filmy w intermecie
Jon Haslam
2006-Dec-22 18:51 UTC
[dtrace-discuss] dtrace: processing aborted: Abort due to systemic unresponsiveness - again
Hi, Apologies for taking a while to get back you on this. This issue is to do with how we handle (or don''t handle) tsc''s being different across CPU''s in the timer function that DTrace is using. I''ve logged the following CR for this: 6507659: tsc differences between CPU''s give dtrace_gethrtime() serious problems Hopefully a fix will be coming shortly. Cheers. Jon.>Hello, > >we have 2-way (4-cores) Opteron server (x220M2): > >bash-3.00# uname -a >SunOS e2 5.10 Generic_118855-19 i86pc i386 i86pc >bash-3.00# psrinfo -v >Status of virtual processor 0 as of: 12/14/2006 14:08:12 > on-line since 12/13/2006 11:41:29. > The i386 processor operates at 2613 MHz, > and has an i387 compatible floating point processor. >Status of virtual processor 1 as of: 12/14/2006 14:08:12 > on-line since 12/13/2006 11:41:34. > The i386 processor operates at 2613 MHz, > and has an i387 compatible floating point processor. >Status of virtual processor 2 as of: 12/14/2006 14:08:12 > on-line since 12/13/2006 11:41:36. > The i386 processor operates at 2613 MHz, > and has an i387 compatible floating point processor. >Status of virtual processor 3 as of: 12/14/2006 14:08:12 > on-line since 12/13/2006 11:41:38. > The i386 processor operates at 2613 MHz, > and has an i387 compatible floating point processor. > >I was trying to catch syscalls and got unexpected message: >bash-3.00# dtrace -n ''syscall::: { @[execname] = count (); }'' >dtrace: description ''syscall::: '' matched 454 probes >dtrace: processing aborted: Abort due to systemic unresponsiveness > >Because the server is not busy and is being prepared for a production >I was surprised and did dtrace-ing again with vmstat-ing it. Below is output from vmstat: > >[...] > > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 668 160 316 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 671 130 268 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 3 3 1 1 738 250 326 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 2 2 1 1 690 145 284 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 651 161 285 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 687 284 316 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 634 146 268 0 0 100 > 0 0 0 14546888 7392452 0 0 0 0 0 0 0 0 0 0 0 644 150 271 0 0 100 > 0 0 0 14546284 7391632 143 288 0 0 0 0 0 0 0 0 0 692 910 342 0 0 100 > 0 0 0 14459332 7312120 228 3332 0 0 0 0 0 0 0 0 0 884 2050 293 5 4 91 [1] > 0 0 0 14459332 7312116 0 1 0 0 0 0 0 0 0 0 0 865 204 286 0 1 99 > kthr memory page disk faults cpu > r b w swap free re mf pi po fr de sr cd cd m2 m3 in sy cs us sy id > 0 0 0 14459332 7312116 0 0 0 0 0 0 0 0 0 0 0 889 232 306 0 0 100 > 0 0 0 14459332 7312116 0 0 0 0 0 0 0 1 1 0 0 938 326 353 0 0 100 > 0 0 0 14546272 7391616 0 1 0 0 0 0 0 0 0 0 0 53402 1389 1109 4 3 93 [2] > 0 0 0 14546272 7391616 148 362 0 0 0 0 0 1 1 1 1 1691 3243 1915 3 1 96 > 0 0 0 14545648 7390776 0 0 0 0 0 0 0 0 0 0 0 635 97 264 0 0 100 > 0 0 0 14545648 7390776 1 3 0 0 0 0 0 0 0 0 0 687 210 340 0 0 100 > 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 718 289 380 0 0 100 > 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 676 224 322 0 0 100 > 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 641 126 278 0 0 100 > 0 0 0 14545648 7390772 0 0 0 0 0 0 0 0 0 0 0 642 127 263 0 0 100 >[...] > >The [1] is when I ran the dtrace and [2] is when I got the >message ("unresponsiveness"). > >I have read the relevant topic: >http://www.opensolaris.org/jive/thread.jspa?messageID=15073& >and am aware that: >- enabling destructive actions (-w) >or >- tuning below parameters for deadmen > dtrace_deadman_user > dtrace_deadman_interval > dtrace_deadman_timeout >can be helpfull. I agree that all of them are useful while server is really busy >but I wouldn''t expect such behaviour on an idle server ! > >Is there any way to solve the problem without the tweaks ? I would like to get >more knowledge about a nature of the problem. > >Regards >przemol > > >---------------------------------------------------------------------- >smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret >http://link.interia.pl/f19d4 - najlepsze filmy w intermecie > >_______________________________________________ >dtrace-discuss mailing list >dtrace-discuss at opensolaris.org > >
Robert Milkowski
2007-Apr-19 00:40 UTC
[dtrace-discuss] Re: dtrace: processing aborted: Abort due to systemic
I can see it''s been fixed in snv_58. Is S10 patch planned? Should I ask for escalation via support? This message posted from opensolaris.org
Jon Haslam
2007-Apr-20 17:23 UTC
[dtrace-discuss] Re: dtrace: processing aborted: Abort due to systemic
Hi Robert,> I can see it''s been fixed in snv_58. > Is S10 patch planned? > Should I ask for escalation via support?I can''t be sure that you are hitting the same problem but I assume you are referring to: 6507659 tsc differences between CPU''s give dtrace_gethrtime serious problems. If so, then it looks like there is an escalation open for this. If you need immediate relief you should contact your support channel who will probably be able to get you an IDR patch. You may be aware but I''ll mention it anyway; enable destructive actions to stop the consumer aborting. Of course, the very nature of the problem means that your high resolution timestamps could be wrong occasionally so you need to bear that in mind when using the ''timestamp'' builtin (for example). Jon.