[Bug 219927] awg0 stops working after a long output under ssh

Mon Jun 12 13:31:27 UTC 2017

Tested with TX_MAG_SEGS at 20 and that is also stable for me, so I added a
patch to the original bug report:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219927

The only downside I can see is a modest increase in kernel stack usage:

Index: sys/arm/allwinner/if_awg.c
===================================================================

--- sys/arm/allwinner/if_awg.c  (revision 319826)
+++ sys/arm/allwinner/if_awg.c  (working copy)
@@ -92,7 +92,7 @@
 #define        TX_SKIP(n, o)           (((n) + (o)) & (TX_DESC_COUNT - 1))
 #define        RX_NEXT(n)              (((n) + 1) & (RX_DESC_COUNT - 1))

-#define        TX_MAX_SEGS             10
+#define        TX_MAX_SEGS             20

 #define        SOFT_RST_RETRY          1000
 #define        MII_BUSY_RETRY          1000
@@ -419,14 +419,18 @@
            sc->tx.buf_map[index].map, m, segs, &nsegs, BUS_DMA_NOWAIT);
        if (error == EFBIG) {
                m = m_collapse(m, M_NOWAIT, TX_MAX_SEGS);
-               if (m == NULL)
+               if (m == NULL) {
+                       device_printf(sc->miibus, "awg_setup_txbuf:
m_collapse failed\n");
                        return (0);
+               }
                *mp = m;
                error = bus_dmamap_load_mbuf_sg(sc->tx.buf_tag,
                    sc->tx.buf_map[index].map, m, segs, &nsegs,
BUS_DMA_NOWAIT);
        }
-       if (error != 0)
+       if (error != 0) {
+               device_printf(sc->miibus, "awg_setup_txbuf:
bus_dmamap_load_mbuf_sg failed\n");
                return (0);
+       }

        bus_dmamap_sync(sc->tx.buf_tag, sc->tx.buf_map[index].map,
            BUS_DMASYNC_PREWRITE);




Op ma 12 jun. 2017 om 10:47 schreef Tom Vijlbrief <tvijlbrief at gmail.com>:

>
>
> Op ma 12 jun. 2017 09:59 schreef Henri Hennebert <hlh at restart.be>:
>
>> On 06/11/2017 17:54, Tom Vijlbrief wrote:
>> >
>> > Op zo 11 jun. 2017 om 16:23 schreef <bugzilla-noreply at freebsd.org
>> > <mailto:bugzilla-noreply at freebsd.org>>:
>> >
>> >     https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219927
>> >
>> >                  Bug ID: 219927
>> >                 Summary: awg0 stops working after a long output under
>> ssh
>> >                 Product: Base System
>> >                 Version: CURRENT
>> >                Hardware: arm64
>> >                      OS: Any
>> >                  Status: New
>> >                Severity: Affects Only Me
>> >                Priority: ---
>> >               Component: arm
>> >                Assignee: freebsd-arm at FreeBSD.org
>> >                Reporter: hlh at restart.be <mailto:hlh at restart.be>
>> >
>> >     Environment: pine64+ 2GB
>> >     FreeBSD norquay.restart.bel 12.0-CURRENT FreeBSD 12.0-CURRENT #0
>> >     r318945M: Sat
>> >     Jun 10 11:47:44 CEST 2017
>> >     root at norquay.restart.bel:/usr/obj/usr/src/sys/NORQUAY  arm64
>> >
>> >     If I connect from a wireless computer (FreeBSD 11.1-PRERELEASE #0
>> >     r318860) and
>> >     run a command with a big output (eg `find /`) the awg0 stops working
>> >     quickly
>> >     (under 20 seconds of output).
>> >
>> >     If I do the same with telnet from the same computer, the output is
>> >     much longer
>> >     but awg0 stops working.
>> >
>> >     If I do the same from a wired computer then I must run `find /` 2 or
>> >     3 times
>> >     before awg0 stops working.
>> >
>> >     I can rsync through ssh 12GB without problem in both directions
>> >     (from and to
>> >     the pine64 and the wireless computer).
>> >
>> >     I have a `tcpdump -w ssh.data port 22`. (8.3 MB)
>> >
>> >     I can connect with a serial console to the pine64 after awg0 stop
>> >     working.
>> >     ifconfig awg0 down
>> >     ifconfig awg0 up
>> >     don't restore the connectivity. I must reboot to restore
>> connectvity.
>> >
>> >
>> > That's a coincidence, today I'm investigating the same issue.
>> >
>> > You could try increasing TX_MAX_SEGS in  sys/arm/allwinner/if_awg.c
>> line 95.
>> >
>> > I'm currently testing TX_MAX_SEGS set to 40 and no lock up yet....
>>
>> Bingo. Your solution solved the problem.
>>
>> Thanks a lot.
>>
>
> Good to hear!
>
> Increasing from 10 to 20 is probably sufficient. It is not clear to me
> what the adverse effects are of a too high value.
>
> The root cause is that the driver tries to call m_collapse with this limit
> and this will fail. The tcp stack will resent the package and the
> m_collapse will fail again and again and ...
>
>