Hi, I''m trying to use dtrace to signal threads in my app when certain events happen, and the raise() action seemed adequate -- looking at the kernel sources, it sends a signal to the currently executing thread, and a quick microbenchmark confirmed this. However, further testing showed that if lots of threads hit the event simultaneously, the signal only gets delivered once, and may also get delivered to a different thread entirely (ie the main thread blocked in pthread_join). This makes sense given that raise() only promises to target the process, not a given thread. Would it be possible to add an equivalent to lwp_kill() that specifically targets a thread? (I also tried setting a watchpoint using procfs, but copyout errors out instead of raising the thread-specific SIGTRAP I had hoped for...) Thanks, Ryan -- This message posted from opensolaris.org
On Mon, Oct 06, 2008 at 11:07:39AM -0700, Ryan wrote:> Would it be possible to add an equivalent to lwp_kill() that > specifically targets a thread?It might be possible to raise() SIGCANCEL. I don''t think DTrace prevents you from doing that, but I''m not sure that it will work either. But then, from code inspection my guess is "no, that won''t work" because issig_forreal() doesn''t post signals from DTrace as directed signals (it calls sigaddq() with t == NULL). Nico --
I don''t think threads are supposed to catch SIGCANCEL; you''re not allowed to signal them with it in user space, at least: man pthread_kill: The sig argument must be one of the signals listed in signal.h(3HEAD), with the exception of SIGCANCEL being reserved and off limits to pthread_kill(). -- This message posted from opensolaris.org
> Would it be possible to add an equivalent to > lwp_kill() that specifically targets a thread?Hmm... after poking around in the code a bit, it''s not clear that this would be easy to implement (some details below). However, it looks much easier to change raise() to accept a second, optional argument. The updated wiki entry might read like this: =========================void raise(int sig, int directed) void raise(int sig) The /raise/ action sends the specified signal to the currently running process, or, if /directed/ is non-zero, to the currently running thread. ========================= Given that there''s no real way for probes on different threads to coordinate, a directed raise() probably covers most potential uses for lwp_kill() anyway. Thoughts? Ryan Details on why implementing lwp_kill() might be hard: With raise() dtrace just marks the thread and lets regular signal handling pick it up "soon"; to target a specific other thread would require finding that thread and marking it, which in turn would require acquiring the process lock (which dtrace doesn''t normally touch) and performing several other actions that raise() defers to the regular signal handling code. Whether this deferral is due to some constraint or whether it was just easier to code up and maintain that way, I wouldn''t know. Alternatively, dtrace could just store the request for later, but then it can''t tell the user if the target lwp exists or not. Plus, a thread could accumulate any number of pending lwp kills before they get processed. -- This message posted from opensolaris.org
Hey Ryan, I think what you''ve identified is a bug rather than an RFE: what you describe is how raise() was intended to work. It sounds like there are cases in which that''s falling down and that should be addressed. If you haven''t already, please file a bug on bugs.opensolaris.org. Thanks. Adam On Oct 8, 2008, at 3:54 PM, Ryan wrote:>> Would it be possible to add an equivalent to >> lwp_kill() that specifically targets a thread? > > Hmm... after poking around in the code a bit, it''s not clear that > this would be easy to implement (some details below). However, it > looks much easier to change raise() to accept a second, optional > argument. The updated wiki entry might read like this: > > =========================> void raise(int sig, int directed) > void raise(int sig) > > The /raise/ action sends the specified signal to the currently > running process, or, if /directed/ is non-zero, to the currently > running thread. > =========================> > Given that there''s no real way for probes on different threads to > coordinate, a directed raise() probably covers most potential uses > for lwp_kill() anyway. > > Thoughts? > Ryan > > Details on why implementing lwp_kill() might be hard: > > With raise() dtrace just marks the thread and lets regular signal > handling pick it up "soon"; to target a specific other thread would > require finding that thread and marking it, which in turn would > require acquiring the process lock (which dtrace doesn''t normally > touch) and performing several other actions that raise() defers to > the regular signal handling code. Whether this deferral is due to > some constraint or whether it was just easier to code up and > maintain that way, I wouldn''t know. > > Alternatively, dtrace could just store the request for later, but > then it can''t tell the user if the target lwp exists or not. Plus, a > thread could accumulate any number of pending lwp kills before they > get processed. > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Fishworks blogs.sun.com/ahl
Hi Adam, It was already submitted as an RFE and got accepted yesterday as CR 6757869. The search form on the main page can''t find it though; do change requests live somewhere else? Below is an excerpt from the confirmation email. Feel free to bump it up from its current "P5-very low" priority if you can find it ;) Note that if raise() really is supposed to be directed at all times, there''s a *very* easy fix in issig_forreal(), as nico pointed out: usr/src/uts/common/os/sig.c 530c530 < sigaddq(p, NULL, &info, KM_NOSLEEP); ---> sigaddq(p, t, &info, KM_NOSLEEP);However, I can imagine situations where an undirected signal would be useful (like if the current thread might mask the signal you care about). BTW, is there any way to make the above change on my Solaris 10 system (T5220)? Or would I need to move to SolarisExpress and recompile the kernel? Regards, Ryan bugmail-sender at sun.com said:> *Synopsis*: No equivalent to lwp_kill() in dtrace > > *Change Request ID*: 6757869 > > *Synopsis*: No equivalent to lwp_kill() in dtrace > > Product: solaris > Category: opensolaris > Subcategory: triage-queue > Type: RFE > Subtype: Status: 1-Dispatched > Substatus: Priority: 5-Very Low > Introduced In Release: Introduced In Build: Responsible Engineer: Keywords: opensolaris > > === *Description* ===========================================================> Category > kernel > Sub-Category > dtrace > Description > The dtrace documentation claims that "the raise action can be used to send a signal at a precise point in a process''s execution." This is only really true for a single-threaded process and/or a single-processor machine. The raise() action *usually* delivers the resulting signal to the lwp the probe fired on, but there is no guarantee, especially if several probes raise the same signal on multiple lwps simultaneously. Though this behavior is consistent with the semantics of kill(), an equivalent to lwp_kill() would be extremely useful for truly precise delivery of signals in multi-threaded processes. > Frequency > Occasionally > Regression > No > Steps to Reproduce > Trace a multithreaded process and call raise() from several lwps at the same time > Expected Result > The signal is delivered to each lwp > Actual Result > Targeted lwps do not always receive the signal and non-targeted lwps may receive one instead >-- This message posted from opensolaris.org
On Thu, Oct 09, 2008 at 11:36:59PM -0700, Ryan wrote:> Hi Adam, > > It was already submitted as an RFE and got accepted yesterday as CR 6757869. The search form on the main page can''t find it though; do change requests live somewhere else? Below is an excerpt from the confirmation email. Feel free to bump it up from its current "P5-very low" priority if you can find it ;) > > Note that if raise() really is supposed to be directed at all times, there''s a *very* easy fix in issig_forreal(), as nico pointed out: > > usr/src/uts/common/os/sig.c > 530c530 > < sigaddq(p, NULL, &info, KM_NOSLEEP); > --- > > sigaddq(p, t, &info, KM_NOSLEEP); > > However, I can imagine situations where an undirected signal would be useful (like if the current thread might mask the signal you care about). > > BTW, is there any way to make the above change on my Solaris 10 system (T5220)? Or would I need to move to SolarisExpress and recompile the kernel?I believe there is an issue here: signals from outside a process are always undirected. The only way to send a directed signal is to attach to the process using /proc. Directed signals can mess up the state of a multi-threaded process. Also, they can have non-intuitive effects (what if you want to SIGABORT a process to generate a core file, but the thread you raise() it on has all signals masked?) That said, I don''t think there''s a problem with letting undirected signals (there are plenty of other destructive ways to change state); I just don''t think it should be the default. Cheers, - jonathan> bugmail-sender at sun.com said: > > *Synopsis*: No equivalent to lwp_kill() in dtrace > > > > *Change Request ID*: 6757869 > > > > *Synopsis*: No equivalent to lwp_kill() in dtrace > > > > Product: solaris > > Category: opensolaris > > Subcategory: triage-queue > > Type: RFE > > Subtype: Status: 1-Dispatched > > Substatus: Priority: 5-Very Low > > Introduced In Release: Introduced In Build: Responsible Engineer: Keywords: opensolaris > > > > === *Description* ===========================================================> > Category > > kernel > > Sub-Category > > dtrace > > Description > > The dtrace documentation claims that "the raise action can be used to send a signal at a precise point in a process''s execution." This is only really true for a single-threaded process and/or a single-processor machine. The raise() action *usually* delivers the resulting signal to the lwp the probe fired on, but there is no guarantee, especially if several probes raise the same signal on multiple lwps simultaneously. Though this behavior is consistent with the semantics of kill(), an equivalent to lwp_kill() would be extremely useful for truly precise delivery of signals in multi-threaded processes. > > Frequency > > Occasionally > > Regression > > No > > Steps to Reproduce > > Trace a multithreaded process and call raise() from several lwps at the same time > > Expected Result > > The signal is delivered to each lwp > > Actual Result > > Targeted lwps do not always receive the signal and non-targeted lwps may receive one instead > > > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
> > usr/src/uts/common/os/sig.c > > 530c530 > > < sigaddq(p, NULL, &info, KM_NOSLEEP); > > --- > > > sigaddq(p, t, &info, KM_NOSLEEP); > > > > However, I can imagine situations where an > undirected signal would be useful (like if the > current thread might mask the signal you care about). > > I believe there is an issue here: signals from outside a process are always > undirected. The only way to send a directed signal > is to attach to the process using /proc. > > Directed signals can mess up the state of a multi-threaded process. > Also, they can have non-intuitive effects (what if you want to SIGABORT > a process to generate a core file, but the thread you > raise() it on has all signals masked?) > > That said, I don''t think there''s a problem with letting undirected signals > (there are plenty of other destructive ways to change > state); I just don''t think it should be the default. > > Cheers, > - jonathanSo we agree that it''s good to keep undirected signals available, then. Adding the optional ''directed'' flag to raise() would do the job nicely, I think. However, if raise() really is supposed to be directed, note that the suggested change in sig.c above *only* affects signals generated by dtrace and, further, would cause the traced thread to signal itself (ie nothing comes from "outside"). It would not be possible to target other threads even within the same process, let alone other processes. -- This message posted from opensolaris.org