Allwinner awg TX hanging issue

John-Mark Gurney jmg at funkthat.com
Thu Sep 6 16:35:54 UTC 2018


Since I upgraded to a recent -current to fix the timer issue on my
A64-LTS board, I've been having an issue where the ethernet interface
will freeze.  This is with:
FreeBSD gate2.funkthat.com 12.0-ALPHA4 FreeBSD 12.0-ALPHA4 #4 r338426M: Wed Sep  5 09:55:12 PDT 2018     root at gate2.funkthat.com:/usr/src/sys/arm64/compile/GENERIC  arm64

The modified code is simply to add some dtrace probe points to debug
this issue.  I also dropped the check for _OACTIVE from _start_locked.

It prints flag at the begining of _start_locked and if _OACTIVE gets
set and at the end of _txeof if progress was made.  It also prints the
progress at the end of txeof if any...  It prints the val of _intr..

I noticed that when it was hung, the OACTIVE flag was set, but this
just means that we ran out of transmit descriptors, and was a symptom
of the problem.

I don't have a good test to trigger this problem.  This happens somewhat
regularly, every 4-12 hours on my router, but my test board, which is
lightly loaded and does not run pf doesn't have this issue.

With the added dtrace probe points, I finally hit this:
  3  10115                        none:intr intr 40000024
  3  10115                        none:intr intr 40000100
  3  10115                        none:intr intr 40000100
  3  10115                        none:intr intr 40000100
  3  10115                        none:intr intr 40000100
  3  10114                       none:flags flag 40
  3  10115                        none:intr intr 40000024
  3  10115                        none:intr intr 40000100
  3  10114                       none:flags flag 40
  3  10115                        none:intr intr 40000024
  3  10115                        none:intr intr 40000100
  3  10114                       none:flags flag 40
  3  10115                        none:intr intr 40000024
  3  10115                        none:intr intr 40000100
  3  10114                       none:flags flag 40
  3  10115                        none:intr intr 40000100
  3  10115                        none:intr intr 40000024
  3  10115                        none:intr intr 4000010a
  3  10114                       none:flags flag 40
  3  10115                        none:intr intr 40000100
  3  10114                       none:flags flag 40
  3  10115                        none:intr intr 40000100
  3  10114                       none:flags flag 40
[...]
  3  10115                        none:intr intr 40000100
  3  10114                       none:flags flag 40
  3  10114                       none:flags flag 440
  3  10115                        none:intr intr 40000100
  3  10115                        none:intr intr 40000100
  3  10115                        none:intr intr 40000100

The intr 24 line is a normal interrupt, and will run txeof to free up
descriptors.  The intr 100 line is saying the RGMII link status
changed, we don't enable it, so I'm not sure why we are getting these
interrupts (it seems like the enable bit is ignored).  These are
normal, and see these lines for a long while.  The flag 440 line is
when we set OACTIVE, and then we see no more flag 40 lines, which
means that _start_locked doesn't get called and that _txeof doesn't
make forward progress.

The problem point is the intr 10a line.  Once we hit that line, we
never get another intr 24 line.  The a is the important part of the
inter status, as it is:

0x8
TX_TIMEOUT_INT
When this bit is asserted, the transmitter had been excessively active.

and:

0x2
TX_DMA_STOPPED_INT
When this bit is asserted, the TX DMA FSM is stopped.

We do not have code in the awg driver to recover from this problem.

Does anyone have any ideas?

Thanks.

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."


More information about the freebsd-arm mailing list