Ok, we've been having a discussion on rgn about the ghostbust code,
and it should really be here.

Regarding:

> >   if (ping_allow_ghostbust && me->p_status == PALIVE && ping_ghostbust >
> >
> > Isn't it possible they bust in some other state?  Shouldn't that be
> > "!= PFREE"?
>
> It would be redundant and in some cases harmful.
>
> Each of the other states already has explicit ghostbust behavior.
> (they're all at either 3 or 6 minutes for outfit and login, respectively)
> PALIVE is the only indeterminate length state that we care about
> ghostbust in.

How would it be harmful?  If we aren't getting pings back during login
or outfit, why not ghostbust it sooner?  Consider that most people who
bust get killed.  And what about observers?  What if some slot gets
stuck in state PDEAD for some reason; shouldn't happen, but who knows?
In short, I think we need robust ping-ghostbust code that handles all
cases.

Sending ping requests down TCP: Perhaps it would be better to
ghostbust clients having TCP trouble, perhaps not.  At some point,
people hosed TCP but not UDP will kill their client if they are stuck,
which should be caught by ghostbuster code.  So for now I say make
ghostbust code robust, and think about sending ping via TCP later on.

Testing hosed link: I can't hose my routing tables, because I don't
have two client machines to test from.  I need to be able to hose just
a single client connection while running an observer client from the
same machine.  I guess I could modify a client to not send ping
requests back.

And while we're on the subject of pings, wouldn't it be useful to have
a ping stat for the last minute?

-Jeff

Carlos Villalpando <unbelver at earthlink.net> writes:
>
> In article <8766pr4x55.fsf at ccs.neu.edu>, jeffno at ccs.neu.edu says...
> > Carlos Villalpando <unbelver at earthlink.net> writes:
> > >
> > > Ghostbust kicks in if you're waiting too long at the refit window, or if
> > > ping is enabled, you miss sending CP_PING_RESPONSE to 5 consecutive
> > > SP_PING requests. (but set to 60 pings (or 2 minutes) in the .sysdef
> > > file)
> >
> > I'm confused.
>
> I'm here to help *grin*.
>
> >  Where is it 5 if the .sysdef is setting the option?
>
> The default values in data.c.  data.c defines them as 1 second ping, 5
> ping misses, and ping_allow_ghostbust is disabled.
>
> > And why is .sysdef set to 2 minutes?
>
> Beats me.  Somebody got to the sample_sysdef before we did?
>
> In the sysdef, its 2 seconds, 60 misses and ping_allow_ghostbust enabled.
>
>
> > > Lets try a simple fix first.
> > >
> > > For the server side, make SP_PING a "critical packet."  Critical packets
> > > always get sent down the TCP pipe.
> >
> > What is this fixing?
>
> Well, its moving the ping request to the link that is more sensitive of
> the two links.  UDP can handle missing a few packets or so and _some_
> ping request/responses are likely to get through even if half the link is
> hosed.  If half the link is hosed for TCP, the pipe will stall and ping
> requests won't make it.  Since ping requests won't make it, client will
> never send ping replies and that will trigger the ghostbust timer more
> often.
>
> Like I said earlier, it is quite possible for network performance to be
> just bad enough not to affect the UDP link too badly, but hose the TCP
> link.  Without the TCP link, you can't, in a timely manner, send or read
> messages, die/rejoin, or do any other critical stuff.  I've personally
> sat trying to rejoin for 5 minutes on continuum and miserably failing,
> while watching what looked to be a good game on the galactic.  My UDP
> link was fine but my TCP link died about 90 seconds before my ship died
> and didn't start up again until 6 minutes later. In those 90 seconds,
> gameplay for me was still very good until I died.. Since pings/responses
> were being sent on the UDP pipe, I never triggered the GB timer.
>
> >  [testing network failure] isn't easy for me to test.
>
> Try putting in a bogus host route on your client machine to the server IP
> address.  That should simulate a dead client->server link.  Or if you
> have access to any of the routers along the way, you can do both
> directions.  You can add and delete the bogus routes to simulate a
> marginal link.
>
> > There also might be a problem in the ping ghostbust.  p_status must be
> > PALIVE.  In input.c:
> >
> >   if (ping_allow_ghostbust && me->p_status == PALIVE && ping_ghostbust >
> >
> > Isn't it possible they bust in some other state?  Shouldn't that be
> > "!= PFREE"?
>
> [reads code....]
>
> It would be redundant and in some cases harmful.
>
> Each of the other states already has explicit ghostbust behavior.
> (they're all at either 3 or 6 minutes for outfit and login, respectively)
> PALIVE is the only indeterminate length state that we care about
> ghostbust in.
>
> In article <8lqqab$10v$1 at venturi.cfr.washington.edu>,
> tap at venturi.cfr.washington.edu (Trent Piepho) says:
>
> > Hmm, so how does this fix anything?
>
> See above.
>
> > It will make the ping times wrong.
>
> Once there is significant lossage, yes.  s->c loss will always be -1 but
> c->s will still be correct.  If there's no pipe stall, rtt should be the
> same.
>
> Ping times, as currently written, however are rarely ever correct once
> you get lossage, anyway. Its a running average.  Monday on continuum, my
> game started out with horrendous, and unplayable lag.  At one point my
> std dev was at 2200ms. Average rtt was up in the 1600ms range.  After
> about 10 minutes or so the network cleared up and instantaneous pings
> dropped into the 90ms range with a sub 10ms std. dev.  My reported ping
> times, however never matched in the 2 hours I played that game.  At the
> end, my reported std. dev was in the 300ms range and my reported rtt was
> in the 130ms range.
>
> To summarize: I don't think it matters.
>
> --Carlos V.