On Wed, Aug 23, 2006 at 12:32:23PM +1000, Stephen Thorne wrote:
> For correctness, "int alarm_count;" should be "sig_atomic_t
> alarm_count;".

Thanks, I've taken this.  I've propragated the SIGALRM changes to the
other robots and done a few tests without any problems observed.

When it came to propagate the changes to ntserv, the main loop wasn't
just a simple pause(2), it was a select(2).  This introduces the
well-known race condition.  See man select ...

"Suppose the signal handler sets a global flag and returns. Then a test
of this global flag followed by a call of select() could hang
indefinitely if the signal arrived just after the test but just before
the call."

In the case of ntserv, the select had a timeout, so people may have seen
occasional two-second pauses when this happens.

The popular solution I've seen in other projects is a signal pipe, which
is a fifo into which the signal number is written when the signal
handler is called.  The select then includes the fifo on the list of
file descriptors to monitor.

This is implemented now, look at ntserv/sigpipe.c in my repo, and the
changes to ntserv/input.c (the main event loop).

-- 
James Cameron    mailto:quozl at us.netrek.org     http://quozl.netrek.org/