/* Make a thread the running thread. The thread must previously been sleeping, and not holding the CPU semaphore. This will set the thread state to VgTs_Runnable, and the thread will attempt to take the CPU semaphore. By the time it returns, tid will be the running thread. */ extern void VG_(set_running) ( ThreadId tid ); /* Set a thread into a sleeping state. Before the call, the thread must be runnable, and holding the CPU semaphore. When this call returns, the thread will be set to the specified sleeping state, and will not be holding the CPU semaphore. Note that another thread could be running by the time this call returns, so the caller must be careful not to touch any shared state. It is also the caller's responsibility to actually block until the thread is ready to run again. */ extern void VG_(set_sleeping) ( ThreadId tid, ThreadStatus state ); The master semaphore is run_sema in vg_scheduler.c. (what happens at a fork?) VG_(scheduler_init) registers sched_fork_cleanup as a child atfork handler. sched_fork_cleanup, among other things, reinitializes the semaphore with a new pipe so the process has its own. -------------------------------------------------------------------- Re: New World signal handling From: Jeremy Fitzhardinge To: Julian Seward Date: Mon Mar 14 09:03:51 2005 Well, the big-picture things to be clear about are: 1. signal handlers are process-wide global state 2. signal masks are per-thread (there's no notion of a process-wide signal mask) 3. a signal can be targeted to either 1. the whole process (any eligable thread is picked for delivery), or 2. a specific thread 1 is why it is always a bug to temporarily reset a signal handler (say, for SIGSEGV), because if any other thread happens to be sent one in that window it will cause havok (I think there's still one instance of this in the symtab stuff). 2 is the meat of your questions; more below. 3 is responsible for some of the nitty detail in the signal stuff, so its worth bearing in mind to understand it all. (Note that even if a signal is targeting the whole process, its only ever delivered to one particular thread; there's no such thing as a broadcast signal.) While a thread are running core code or generated code, it has almost all its signals blocked (all but the fault signals: SEGV, BUS, ILL, etc). Every N basic blocks, each thread calls VG_(poll_signals) to see what signals are pending for it. poll_signals grabs the next pending signal which the client signal mask doesn't block, and sets it up for delivery; it uses the sigtimedwait() syscall to fetch blocked pending signals rather than have them delivered to a signal handler. This means that we avoid the complexity of having signals delivered asynchronously via the signal handlers; we can just poll for them synchronously when they're easy to deal with. Fault signals, being caused by a specific instruction, are the exception because they can't be held off; if they're blocked when an instruction raises one, the kernel will just summarily kill the process. Therefore, they need to be always unblocked, and the signal handler is called when an instruction raises one of these exceptions. (It's also necessary to call poll_signals after any syscall which may raise a signal, since signal-raising syscalls are considered to be synchronous with respect to their signal; ie, calling kill(getpid(), SIGUSR1) will call the handler for SIGUSR1 before kill is seen to complete.) The one time when the thread's real signal mask actually matches the client's requested signal mask is while running a blocking syscall. We have to set things up to accept signals during a syscall so that we get the right signal-interrupts-syscall semantics. The tricky part about this is that there's no general atomic set-signal-mask-and-block-in-syscall mechanism, so we need to fake it with the stuff in VGA_(_client_syscall)/VGA_(interrupted_syscall). These two basically form an explicit state machine, where the state variable is the instruction pointer, which allows it to determine what point the syscall got to when the async signal happens. By keeping the window where signals are actually unblocked very narrow, the number of possible states is pretty small. This is all quite nice because the kernel does almost all the work of determining which thread should get a signal, what the correct action for a syscall when it has been interrupted is, etc. Particularly nice is that we don't need to worry about all the queuing semantics, and the per-signal special cases (which is, roughly, signals 1-32 are not queued except when they are, and signals 33-64 are queued except when they aren't). BUT, there's another complexity: because the Unix signal mechanism has been overloaded to deal with two separate kinds of events (asynchronous signals raised by kill(), and synchronous faults raised by an instruction), we can't block a signal for one form and not the other. That is, because we have to leave SIGSEGV unblocked for faulting instructions, it also leaves us open to getting an async SIGSEGV sent with kill(pid, SIGSEGV). To handle this case, there's a small per-thread signal queue set up to deal with this case (I'm using tid 0's queue for "signals sent to the whole process" - a hack, I'll admit). If an async SIGSEGV (etc) signal appears, then it is pushed onto the appropriate queue. VG_(poll_signals) also checks these queues for pending signals to decide what signal to deliver next. These queues are only manipulated with *all* signals blocked, so there's no risk of two concurrent async signal handlers modifying the queues at once. Also, because the liklihood of actually being sent an async SIGSEGV is pretty low, the queues are only allocated on demand. There are two mechanisms to prevent disaster if multiple threads get signals concurrently. One is that a signal handler is set up to block a set of signals while the signal is being delivered. Valgrind's handlers block all signals, so there's no risk of a new signal being delivered to the same thread until the old handler has finished. The other is that if the thread which recieves the signal is not running (ie, doesn't hold the run_sema, which implies it must be waiting for a syscall to complete), then the signal handler will grab the run_sema before making any global state changes. Since the only time we can get an async signal asynchronously is during a blocking syscall, this should be all the time. (And since synchronous signals are always the result of running an instruction, we should already be holding run_sema.) Valgrind will occasionally generate signals for itself. These are always synchronous faults as a result instruction fetch or something an instruction did. The two mechanims are the synth_fault_* functions, which are used to signal a problem while fetching an instruction, or by getting generated code to call a helper which contains a fault-raising instruction (used to deal with illegal/unimplemented instructions and for instructions who's only job is to raise exceptions). That all explains how signals come in, but the second part is how they get delivered. The main function for this is VG_(deliver_signal). There are three cases: 1. the process is ignoring the signal (SIG_IGN) 2. the process is using the default handler (SIG_DFL) 3. the process has a handler for the signal In general, VG_(deliver_signal) shouldn't be called for ignored signals; if it has been called, it assumes the ignore is being overridden (if an instruction gets a SEGV etc, SIG_IGN is ignored and treated as SIG_DFL). VG_(deliver_signal) handles the default handler case, and the client-specified signal handler case. The default handler case is relatively easy: the signal's default action is either Terminate, or Ignore. We can ignore Ignore. Terminate always kills the entire process; there's no such thing as a thread-specific signal death. Terminate comes in two forms: with coredump, or without. vg_default_action() will write a core file, and then will tell all the threads to start terminating; it then longjmps back to the current thread's scheduler loop. The scheduler loop will terminate immediately, and the master_tid thread will wait for all the others to exit before shutting down the process (this is the same mechanism as exit_group). Delivering a signal to a client-side handler modifys the thread state so that there's a signal frame on the stack, and the instruction pointer is pointing to the handler. The fiddly bit is that there are two completely different signal frame formats: old and RT. While in theory the exact shape of these frames on stack is abstracted, there are real programs which know exactly where various parts of the structures are on stack (most notably, g++'s exception throwing code), which is why it has to have two separate pieces of code for each frame format. Another tricky case is dealing with the client stack running out/overflowing while setting up the signal frame. Signal return is also interesting. There are two syscalls, sigreturn and rt_sigreturn, which a signal handler will use to resume execution. The client will call the right one for the frame it was passed, so the core doesn't need to track that state. The tricky part is moving the frame's register state back into the thread's state, particularly all the FPU state reformatting gunk. Also, *sigreturn checks for new pending signals after the old frame has been cleaned up, since there's a requirement that all deliverable pending signals are delivered before the mainline code makes progress. This means that a program could live-lock on signals, but that's what would happen running natively... Another thing to watch for: programs which unwind the stack (like gdb, or exception throwers) recognize the existence of a signal frame by looking at the code the return address points to: if it is one of the two specific signal return sequences, it knows its a signal frame. That's why the signal handler return address must point to a very specific set of instructions. What else. Ah, the two internal signals. SIGVGKILL is pretty straightforward: its just used to dislodge a thread from being blocked in a syscall, so that we can get the thread to terminate in a timely fashion. SIGVGCHLD is used by a thread to tell the master_tid that it has exited. However, the only time the master_tid cares about this is when it has already exited, and its waiting for everyone else to exit. If the master_tid hasn't exited, then this signal is ignored. It isn't enough to simply block it, because that will cause a pile of queued SIGVGCHLDs to build up, eventually clogging the kernel's signal delivery mechanism. If its unblocked and ignored, it doesn't interrupt syscalls and it doesn't accumulate. I hope that helps clarify things. And explain why there's so much stuff in there: it's tracking a very complex and arcane underlying set of machinery. J -------------------------------------------------------------------- >I've been seeing references to 'master thread' around the place. >What distinguishes the master thread from the rest? Where does >the requirement to have a master thread come from? > It used to be tid 1, but I had to generalize it. The master_tid isn't very special; its main job is at process shutdown. It waits for all the other threads to exit, and then produces all the final reports. Until it exits, it's just a normal thread, with no other responsibilities. The alternative to having a master thread would be to make whichever thread exits last be responsible for emitting all the output. That would work, but it would make the results a bit asynchronous (that is, if the main thread exits and the other hang around for a while, anyone waiting on the process would see it as having exited, but no results would have been produced). VG_(master_tid) is a varable to handle the case where a threaded program forks. In the first process, the master_tid will be 1. If that program creates a few threads, and then, say, thread 3 forks, the child process will have a single thread in it. In the child, master_tid will be 3. It was easier to make the master thread a variable than to try to work out how to rename thread 3 to 1 after a fork. J -------------------------------------------------------------------- Re: Fwd: Documentation of kernel's signal routing ? From: David Woodhouse <...> To: Julian Seward > Regarding sys_clone created threads. I have a vague idea that > there is a notion of 'thread group'. I further understand that if > one thread in a group calls sys_exit_group then all threads in that > group exit. Whereas if a thread calls sys_exit then just that > thread exits. > > I'm pretty hazy on this: Hmm, so am I :) > * Is the above correct? Yes, I believe so. > * How is thread-group membership defined/changed? By specifying CLONE_THREAD in the flags to clone(), you remain part of the same thread group as the parent. In a single-threaded process, the thread group id (tgid) is the same as the pid. Linux just has tasks, which sometimes happen to share VM -- and now with NPTL we also share other stuff like signals, etc. The 'pid' in Linux is what POSIX would call the 'thread id', and the 'tgid' in Linux is equivalent to the POSIX 'pid'. > * Do you know offhand how LinuxThreads and NPTL use thread groups? I believe that LT doesn't use the kernel's concept of thread groups at all. LT predates the kernel's support for proper POSIX-like sharing of anything much but memory, so uses only the CLONE_VM (and possibly CLONE_FILES) flags. I don't _think_ it uses CLONE_SIGHAND -- it does most of its work by propagating signals manually between threads. NTPL uses thread groups as generated by the CLONE_THREAD flag, which is what invokes the POSIX-related thread semantics. > Is it the case that each LinuxThreads threads is in its own > group whereas all NTPL threads [in a process] are in a single > group? Yes, that's my understanding. -- dwmw2