dmtpps driver on beaglebone [was: Is it a good idea to use a usb-serial adapter for PPS input? Yes, it is.]

Mon Aug 19 15:52:08 UTC 2019

On Mon, 2019-08-19 at 18:22 +0300, Konstantin Belousov wrote:
> On Mon, Aug 19, 2019 at 08:47:46AM -0600, Ian Lepore wrote:
> > I first ran into the problem on a beaglebone a few years ago, not that
> > long after the number was reduced to two.  The number 4 is pretty much
> > just a vague-ish memory of the debugging I did back then.  When you
> > don't have enough timehands, the symptom is that you lose most of the
> > pps events (but enough get captured that ntpd will more or less run
> > properly on the reduced data).
> > 
> > The busier the system is, the more events get captured succesfully,
> > because the loss is caused when the pps pulse happens while sleeping in
> > cpu_idle(); the path that unwinds out of there leads to calling
> > tc_windup 3 times, iirc, which means the value latched in the
> > timecounter hardware belongs to a set of timehands whose generation
> > count is not valid anymore because of all the tc_windup calls, and the
> > event gets discarded because of the generation mismatch.
> 
> Is the problem specific to the platform then ?

I think it is specific to timecounters that implement pps capture using
the polling mechanism (tc_poll_pps pointer is non-NULL), and I remember
that there was something about the sequence of events that made me
think it would only be a problem on a single-core system.  Or maybe
just way more likely on a single-core system (I've got similarly-
structured drivers on multicore systems that don't have this problem). 
But I don't think it was so much platform-specific as specific to that
pair of conditions.

> 
> The issue you see might come from tc_ticktock(), but it is strange.
> All calls to tc_windup() are serialized by spinlock. tc_ticktock(),
> potentially being called from interrupt context, uses try-lock to avoid
> spinning while other CPU does the update. But this is impossible on UP
> machine, because spinlock in top half disables interrupts, which means
> that event that triggers tc_ticktock() call and which trylock fails
> should be impossible, until the top half finishes.
> 
> Having too many timehands is bad because reader get a higher chance to
> find obsoleted th pointer.  With the rework of the generation counts for
> timehands reader would not use stalled data, but it might have to spin
> somewhat prolonged time.
> 
> In fact, it might be worth trying reducing the timehands count from two
> to one, since then there is no non-current th at all.

The driver involved is arm/ti/am335x/am335x_dmptpps.c.  The basic
sequence that occurs is:

The dmtpps_poll() function is called (via the timecounter.tc_poll_pps
pointer), and it calls pps_capture(), then schedules a taskqueue_fast
task to do the rest of the processing.  That task function calls
pps_event(), and most of the time pps_event() does nothing because it
hits this early out:

	/* If the timecounter was wound up underneath us, bail out. */
	if (pps->capgen == 0 || pps->capgen !=
	    atomic_load_acq_int(&pps->capth->th_generation))
		return;

I forget the exact sequence, but there was something about getting from
the point where the taskqueue task begins running to the point where it
calls pps_event() that almost always burned through 3 timehands
generations, but if the machine was busy enough, sometimes there would
only be one windup and the event would get processed properly.

I don't remember why I used a taskqueue task to separate the capture
and event processing.  Maybe so I could use a mutex to protect the pps
event state, and it was written before I extended the internal pps api
stuff to let drivers use a spin mutex with pps_ioctl().

-- Ian