svn commit: r331347 - in head: etc/mtree include sys/conf sys/dev/tcp_log sys/kern sys/netinet usr.bin/netstat

Ruslan Bukin ruslan.bukin at cl.cam.ac.uk
Thu Mar 22 18:23:15 UTC 2018


Look at these
https://ci.freebsd.org/job/FreeBSD-head-mips-build/lastBuild/console
https://ci.freebsd.org/job/FreeBSD-head-powerpc-build/lastBuild/console

Example
make -j5 TARGET=mips TARGET_ARCH=mipsel kernel-toolchain
make -j5 TARGET=mips TARGET_ARCH=mipsel KERNCONF=CANNA buildkernel

Ruslan

On Thu, Mar 22, 2018 at 03:39:23PM +0000, Jonathan Looney wrote:
>    A tinderbox build didn't complain about atomic_fetchadd_64, so I assume it
>    is OK.
>    Yes, this can be made optional, if there is a need for that.
>    Jonathan
>    On Thu, Mar 22, 2018 at 2:22 PM, Ruslan Bukin
>    <[1]ruslan.bukin at cl.cam.ac.uk> wrote:
> 
>      Also can this be pluggable ?
>      It looks like it is optional device which means it can free up some
>      space in embedded environment when unused
>      Ruslan
>      On Thu, Mar 22, 2018 at 02:16:06PM +0000, Ruslan Bukin wrote:
>      > We don't have atomic_fetchadd_64 for mips32 I think
>      >
>      > Ruslan
>      >
>      > On Thu, Mar 22, 2018 at 09:40:08AM +0000, Jonathan T. Looney wrote:
>      > > Author: jtl
>      > > Date: Thu Mar 22 09:40:08 2018
>      > > New Revision: 331347
>      > > URL: [2]https://svnweb.freebsd.org/changeset/base/331347
>      > >
>      > > Log:
>      > >   Add the "TCP Blackbox Recorder" which we discussed at the
>      developer
>      > >   summits at BSDCan and BSDCam in 2017.
>      > >
>      > >   The TCP Blackbox Recorder allows you to capture events on a TCP
>      connection
>      > >   in a ring buffer. It stores metadata with the event. It
>      optionally stores
>      > >   the TCP header associated with an event (if the event is
>      associated with a
>      > >   packet) and also optionally stores information on the sockets.
>      > >
>      > >   It supports setting a log ID on a TCP connection and using this
>      to correlate
>      > >   multiple connections that share a common log ID.
>      > >
>      > >   You can log connections in different modes. If you are doing a
>      coordinated
>      > >   test with a particular connection, you may tell the system to
>      put it in
>      > >   mode 4 (continuous dump). Or, if you just want to monitor for
>      errors, you
>      > >   can put it in mode 1 (ring buffer) and dump all the ring buffers
>      associated
>      > >   with the connection ID when we receive an error signal for that
>      connection
>      > >   ID. You can set a default mode that will be applied to a
>      particular ratio
>      > >   of incoming connections. You can also manually set a mode using
>      a socket
>      > >   option.
>      > >
>      > >   This commit includes only basic probes. rrs@ has added quite an
>      abundance
>      > >   of probes in his TCP development work. He plans to commit those
>      soon.
>      > >
>      > >   There are user-space programs which we plan to commit as ports.
>      These read
>      > >   the data from the log device and output pcapng files, and then
>      let you
>      > >   analyze the data (and metadata) in the pcapng files.
>      > >
>      > >   Reviewed by:      gnn (previous version)
>      > >   Obtained from:    Netflix, Inc.
>      > >   Relnotes: yes
>      > >   Differential Revision:   
>      [3]https://reviews.freebsd.org/D11085
>      > >
>      > > Added:
>      > >   head/sys/dev/tcp_log/
>      > >   head/sys/dev/tcp_log/tcp_log_dev.c   (contents, props changed)
>      > >   head/sys/dev/tcp_log/tcp_log_dev.h   (contents, props changed)
>      > >   head/sys/netinet/tcp_log_buf.c   (contents, props changed)
>      > >   head/sys/netinet/tcp_log_buf.h   (contents, props changed)
>      > > Modified:
>      > >   head/etc/mtree/BSD.include.dist
>      > >   head/include/Makefile
>      > >   head/sys/conf/files
>      > >   head/sys/kern/subr_witness.c
>      > >   head/sys/netinet/tcp.h
>      > >   head/sys/netinet/tcp_input.c
>      > >   head/sys/netinet/tcp_output.c
>      > >   head/sys/netinet/tcp_subr.c
>      > >   head/sys/netinet/tcp_timer.c
>      > >   head/sys/netinet/tcp_usrreq.c
>      > >   head/sys/netinet/tcp_var.h
>      > >   head/usr.bin/netstat/inet.c
>      > >   head/usr.bin/netstat/main.c
>      > >   head/usr.bin/netstat/netstat.1
>      > >   head/usr.bin/netstat/netstat.h
>      > >
>      > > Modified: head/etc/mtree/BSD.include.dist
>      > >
>      ==============================================================================
>      > > --- head/etc/mtree/BSD.include.dist Thu Mar 22 08:32:39 2018     
>        (r331346)
>      > > +++ head/etc/mtree/BSD.include.dist Thu Mar 22 09:40:08 2018     
>        (r331347)
>      > > @@ -158,6 +158,8 @@
>      > >          ..
>      > >          speaker
>      > >          ..
>      > > +        tcp_log
>      > > +        ..
>      > >          usb
>      > >          ..
>      > >          vkbd
>      > >
>      > > Modified: head/include/Makefile
>      > >
>      ==============================================================================
>      > > --- head/include/Makefile   Thu Mar 22 08:32:39 2018       
>      (r331346)
>      > > +++ head/include/Makefile   Thu Mar 22 09:40:08 2018       
>      (r331347)
>      > > @@ -47,7 +47,7 @@ LSUBDIRS= cam/ata cam/mmc cam/nvme cam/scsi \
>      > >     dev/hwpmc dev/hyperv \
>      > >     dev/ic dev/iicbus dev/io dev/lmc dev/mfi dev/mmc dev/nvme \
>      > >     dev/ofw dev/pbio dev/pci ${_dev_powermac_nvram} dev/ppbus
>      dev/smbus \
>      > > -   dev/speaker dev/vkbd dev/wi \
>      > > +   dev/speaker dev/tcp_log dev/vkbd dev/wi \
>      > >     fs/devfs fs/fdescfs fs/msdosfs fs/nandfs fs/nfs fs/nullfs \
>      > >     fs/procfs fs/smbfs fs/udf fs/unionfs \
>      > >     geom/cache geom/concat geom/eli geom/gate geom/journal
>      geom/label \
>      > >
>      > > Modified: head/sys/conf/files
>      > >
>      ==============================================================================
>      > > --- head/sys/conf/files     Thu Mar 22 08:32:39 2018       
>      (r331346)
>      > > +++ head/sys/conf/files     Thu Mar 22 09:40:08 2018       
>      (r331347)
>      > > @@ -3161,6 +3161,7 @@ dev/syscons/star/star_saver.c optional
>      star_saver
>      > >  dev/syscons/syscons.c              optional sc
>      > >  dev/syscons/sysmouse.c             optional sc
>      > >  dev/syscons/warp/warp_saver.c      optional warp_saver
>      > > +dev/tcp_log/tcp_log_dev.c  optional inet | inet6
>      > >  dev/tdfx/tdfx_linux.c              optional tdfx_linux tdfx
>      compat_linux
>      > >  dev/tdfx/tdfx_pci.c                optional tdfx pci
>      > >  dev/ti/if_ti.c                     optional ti pci
>      > > @@ -4309,6 +4310,7 @@ netinet/tcp_debug.c           optional
>      tcpdebug
>      > >  netinet/tcp_fastopen.c             optional inet
>      tcp_rfc7413 | inet6 tcp_rfc7413
>      > >  netinet/tcp_hostcache.c            optional inet | inet6
>      > >  netinet/tcp_input.c                optional inet | inet6
>      > > +netinet/tcp_log_buf.c              optional inet | inet6
>      > >  netinet/tcp_lro.c          optional inet | inet6
>      > >  netinet/tcp_output.c               optional inet | inet6
>      > >  netinet/tcp_offload.c              optional tcp_offload
>      inet | tcp_offload inet6
>      > >
>      > > Added: head/sys/dev/tcp_log/tcp_log_dev.c
>      > >
>      ==============================================================================
>      > > --- /dev/null       00:00:00 1970   (empty, because file is
>      newly added)
>      > > +++ head/sys/dev/tcp_log/tcp_log_dev.c      Thu Mar 22 09:40:08
>      2018        (r331347)
>      > > @@ -0,0 +1,521 @@
>      > > +/*-
>      > > + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
>      > > + *
>      > > + * Copyright (c) 2016-2017
>      > > + * Netflix Inc.  All rights reserved.
>      > > + *
>      > > + * Redistribution and use in source and binary forms, with or
>      without
>      > > + * modification, are permitted provided that the following
>      conditions
>      > > + * are met:
>      > > + * 1. Redistributions of source code must retain the above
>      copyright
>      > > + *    notice, this list of conditions and the following
>      disclaimer.
>      > > + * 2. Redistributions in binary form must reproduce the above
>      copyright
>      > > + *    notice, this list of conditions and the following
>      disclaimer in the
>      > > + *    documentation and/or other materials provided with the
>      distribution.
>      > > + *
>      > > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS
>      IS'' AND
>      > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
>      TO, THE
>      > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>      PARTICULAR PURPOSE
>      > > + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
>      BE LIABLE
>      > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>      CONSEQUENTIAL
>      > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
>      SUBSTITUTE GOODS
>      > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
>      INTERRUPTION)
>      > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
>      CONTRACT, STRICT
>      > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
>      IN ANY WAY
>      > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>      POSSIBILITY OF
>      > > + * SUCH DAMAGE.
>      > > + *
>      > > + */
>      > > +
>      > > +#include <sys/cdefs.h>
>      > > +__FBSDID("$FreeBSD$");
>      > > +
>      > > +#include <sys/param.h>
>      > > +#include <sys/conf.h>
>      > > +#include <sys/fcntl.h>
>      > > +#include <sys/filio.h>
>      > > +#include <sys/kernel.h>
>      > > +#include <sys/lock.h>
>      > > +#include <sys/malloc.h>
>      > > +#include <sys/module.h>
>      > > +#include <sys/poll.h>
>      > > +#include <sys/queue.h>
>      > > +#include <sys/refcount.h>
>      > > +#include <sys/mutex.h>
>      > > +#include <sys/selinfo.h>
>      > > +#include <sys/socket.h>
>      > > +#include <sys/socketvar.h>
>      > > +#include <sys/sysctl.h>
>      > > +#include <sys/tree.h>
>      > > +#include <sys/uio.h>
>      > > +#include <machine/atomic.h>
>      > > +#include <sys/counter.h>
>      > > +
>      > > +#include <dev/tcp_log/tcp_log_dev.h>
>      > > +
>      > > +#ifdef TCPLOG_DEBUG_COUNTERS
>      > > +extern counter_u64_t tcp_log_que_read;
>      > > +extern counter_u64_t tcp_log_que_freed;
>      > > +#endif
>      > > +
>      > > +static struct cdev *tcp_log_dev;
>      > > +static struct selinfo tcp_log_sel;
>      > > +
>      > > +static struct log_queueh tcp_log_dev_queue_head =
>      STAILQ_HEAD_INITIALIZER(tcp_log_dev_queue_head);
>      > > +static struct log_infoh tcp_log_dev_reader_head =
>      STAILQ_HEAD_INITIALIZER(tcp_log_dev_reader_head);
>      > > +
>      > > +MALLOC_DEFINE(M_TCPLOGDEV, "tcp_log_dev", "TCP log device data
>      structures");
>      > > +
>      > > +static int tcp_log_dev_listeners = 0;
>      > > +
>      > > +static struct mtx tcp_log_dev_queue_lock;
>      > > +
>      > > +#define    TCP_LOG_DEV_QUEUE_LOCK()       
>      mtx_lock(&tcp_log_dev_queue_lock)
>      > > +#define    TCP_LOG_DEV_QUEUE_UNLOCK()     
>      mtx_unlock(&tcp_log_dev_queue_lock)
>      > > +#define    TCP_LOG_DEV_QUEUE_LOCK_ASSERT()
>      mtx_assert(&tcp_log_dev_queue_lock, MA_OWNED)
>      > > +#define    TCP_LOG_DEV_QUEUE_UNLOCK_ASSERT()
>      mtx_assert(&tcp_log_dev_queue_lock, MA_NOTOWNED)
>      > > +#define    TCP_LOG_DEV_QUEUE_REF(tldq)   
>       refcount_acquire(&((tldq)->tldq_refcnt))
>      > > +#define    TCP_LOG_DEV_QUEUE_UNREF(tldq) 
>       refcount_release(&((tldq)->tldq_refcnt))
>      > > +
>      > > +static void        tcp_log_dev_clear_refcount(struct
>      tcp_log_dev_queue *entry);
>      > > +static void        tcp_log_dev_clear_cdevpriv(void *data);
>      > > +static int tcp_log_dev_open(struct cdev *dev __unused, int flags,
>      > > +    int devtype __unused, struct thread *td __unused);
>      > > +static int tcp_log_dev_write(struct cdev *dev __unused,
>      > > +    struct uio *uio __unused, int flags __unused);
>      > > +static int tcp_log_dev_read(struct cdev *dev __unused, struct uio
>      *uio,
>      > > +    int flags __unused);
>      > > +static int tcp_log_dev_ioctl(struct cdev *dev __unused, u_long cmd,
>      > > +    caddr_t data, int fflag __unused, struct thread *td
>      __unused);
>      > > +static int tcp_log_dev_poll(struct cdev *dev __unused, int events,
>      > > +    struct thread *td);
>      > > +
>      > > +
>      > > +enum tcp_log_dev_queue_lock_state {
>      > > +   QUEUE_UNLOCKED = 0,
>      > > +   QUEUE_LOCKED,
>      > > +};
>      > > +
>      > > +static struct cdevsw tcp_log_cdevsw = {
>      > > +   .d_version =    D_VERSION,
>      > > +   .d_read =       tcp_log_dev_read,
>      > > +   .d_open =       tcp_log_dev_open,
>      > > +   .d_write =      tcp_log_dev_write,
>      > > +   .d_poll =       tcp_log_dev_poll,
>      > > +   .d_ioctl =      tcp_log_dev_ioctl,
>      > > +#ifdef NOTYET
>      > > +   .d_mmap =       tcp_log_dev_mmap,
>      > > +#endif
>      > > +   .d_name =       "tcp_log",
>      > > +};
>      > > +
>      > > +static __inline void
>      > > +tcp_log_dev_queue_validate_lock(int lockstate)
>      > > +{
>      > > +
>      > > +#ifdef INVARIANTS
>      > > +   switch (lockstate) {
>      > > +   case QUEUE_LOCKED:
>      > > +           TCP_LOG_DEV_QUEUE_LOCK_ASSERT();
>      > > +           break;
>      > > +   case QUEUE_UNLOCKED:
>      > > +           TCP_LOG_DEV_QUEUE_UNLOCK_ASSERT();
>      > > +           break;
>      > > +   default:
>      > > +           kassert_panic("%s:%d: unknown queue lock state",
>      __func__,
>      > > +               __LINE__);
>      > > +   }
>      > > +#endif
>      > > +}
>      > > +
>      > > +/*
>      > > + * Clear the refcount. If appropriate, it will remove the entry
>      from the
>      > > + * queue and call the destructor.
>      > > + *
>      > > + * This must be called with the queue lock held.
>      > > + */
>      > > +static void
>      > > +tcp_log_dev_clear_refcount(struct tcp_log_dev_queue *entry)
>      > > +{
>      > > +
>      > > +   KASSERT(entry != NULL, ("%s: called with NULL entry",
>      __func__));
>      > > +
>      > > +   TCP_LOG_DEV_QUEUE_LOCK_ASSERT();
>      > > +
>      > > +   if (TCP_LOG_DEV_QUEUE_UNREF(entry)) {
>      > > +#ifdef TCPLOG_DEBUG_COUNTERS
>      > > +           counter_u64_add(tcp_log_que_freed, 1);
>      > > +#endif
>      > > +           /* Remove the entry from the queue and call the
>      destructor. */
>      > > +           STAILQ_REMOVE(&tcp_log_dev_queue_head, entry,
>      tcp_log_dev_queue,
>      > > +               tldq_queue);
>      > > +           (*entry->tldq_dtor)(entry);
>      > > +   }
>      > > +}
>      > > +
>      > > +static void
>      > > +tcp_log_dev_clear_cdevpriv(void *data)
>      > > +{
>      > > +   struct tcp_log_dev_info *priv;
>      > > +   struct tcp_log_dev_queue *entry, *entry_tmp;
>      > > +
>      > > +   priv = (struct tcp_log_dev_info *)data;
>      > > +   if (priv == NULL)
>      > > +           return;
>      > > +
>      > > +   /*
>      > > +    * Lock the queue and drop our references. We hold references
>      to all
>      > > +    * the entries starting with tldi_head (or, if tldi_head ==
>      NULL, all
>      > > +    * entries in the queue).
>      > > +    *
>      > > +    * Because we don't want anyone adding addition things to the
>      queue
>      > > +    * while we are doing this, we lock the queue.
>      > > +    */
>      > > +   TCP_LOG_DEV_QUEUE_LOCK();
>      > > +   if (priv->tldi_head != NULL) {
>      > > +           entry = priv->tldi_head;
>      > > +           STAILQ_FOREACH_FROM_SAFE(entry,
>      &tcp_log_dev_queue_head,
>      > > +               tldq_queue, entry_tmp) {
>      > > +                   tcp_log_dev_clear_refcount(entry);
>      > > +           }
>      > > +   }
>      > > +   tcp_log_dev_listeners--;
>      > > +   KASSERT(tcp_log_dev_listeners >= 0,
>      > > +       ("%s: tcp_log_dev_listeners is unexpectedly negative",
>      __func__));
>      > > +   STAILQ_REMOVE(&tcp_log_dev_reader_head, priv,
>      tcp_log_dev_info,
>      > > +       tldi_list);
>      > > +   TCP_LOG_DEV_QUEUE_LOCK_ASSERT();
>      > > +   TCP_LOG_DEV_QUEUE_UNLOCK();
>      > > +   free(priv, M_TCPLOGDEV);
>      > > +}
>      > > +
>      > > +static int
>      > > +tcp_log_dev_open(struct cdev *dev __unused, int flags, int devtype
>      __unused,
>      > > +    struct thread *td __unused)
>      > > +{
>      > > +   struct tcp_log_dev_info *priv;
>      > > +   struct tcp_log_dev_queue *entry;
>      > > +   int rv;
>      > > +
>      > > +   /*
>      > > +    * Ideally, we shouldn't see these because of file system
>      > > +    * permissions.
>      > > +    */
>      > > +   if (flags & (FWRITE | FEXEC | FAPPEND | O_TRUNC))
>      > > +           return (ENODEV);
>      > > +
>      > > +   /* Allocate space to hold information about where we are. */
>      > > +   priv = malloc(sizeof(struct tcp_log_dev_info), M_TCPLOGDEV,
>      > > +       M_ZERO | M_WAITOK);
>      > > +
>      > > +   /* Stash the private data away. */
>      > > +   rv = devfs_set_cdevpriv((void *)priv,
>      tcp_log_dev_clear_cdevpriv);
>      > > +   if (!rv) {
>      > > +           /*
>      > > +            * Increase the listener count, add this reader to
>      the list, and
>      > > +            * take references on all current queues.
>      > > +            */
>      > > +           TCP_LOG_DEV_QUEUE_LOCK();
>      > > +           tcp_log_dev_listeners++;
>      > > +           STAILQ_INSERT_HEAD(&tcp_log_dev_reader_head, priv,
>      tldi_list);
>      > > +           priv->tldi_head =
>      STAILQ_FIRST(&tcp_log_dev_queue_head);
>      > > +           if (priv->tldi_head != NULL)
>      > > +                   priv->tldi_cur =
>      priv->tldi_head->tldq_buf;
>      > > +           STAILQ_FOREACH(entry, &tcp_log_dev_queue_head,
>      tldq_queue)
>      > > +                   TCP_LOG_DEV_QUEUE_REF(entry);
>      > > +           TCP_LOG_DEV_QUEUE_UNLOCK();
>      > > +   } else {
>      > > +           /* Free the entry. */
>      > > +           free(priv, M_TCPLOGDEV);
>      > > +   }
>      > > +   return (rv);
>      > > +}
>      > > +
>      > > +static int
>      > > +tcp_log_dev_write(struct cdev *dev __unused, struct uio *uio
>      __unused,
>      > > +    int flags __unused)
>      > > +{
>      > > +
>      > > +   return (ENODEV);
>      > > +}
>      > > +
>      > > +static __inline void
>      > > +tcp_log_dev_rotate_bufs(struct tcp_log_dev_info *priv, int
>      *lockstate)
>      > > +{
>      > > +   struct tcp_log_dev_queue *entry;
>      > > +
>      > > +   KASSERT(priv->tldi_head != NULL,
>      > > +       ("%s:%d: priv->tldi_head unexpectedly NULL",
>      > > +       __func__, __LINE__));
>      > > +   KASSERT(priv->tldi_head->tldq_buf == priv->tldi_cur,
>      > > +       ("%s:%d: buffer mismatch (%p vs %p)",
>      > > +       __func__, __LINE__, priv->tldi_head->tldq_buf,
>      > > +       priv->tldi_cur));
>      > > +   tcp_log_dev_queue_validate_lock(*lockstate);
>      > > +
>      > > +   if (*lockstate == QUEUE_UNLOCKED) {
>      > > +           TCP_LOG_DEV_QUEUE_LOCK();
>      > > +           *lockstate = QUEUE_LOCKED;
>      > > +   }
>      > > +   entry = priv->tldi_head;
>      > > +   priv->tldi_head = STAILQ_NEXT(entry, tldq_queue);
>      > > +   tcp_log_dev_clear_refcount(entry);
>      > > +   priv->tldi_cur = NULL;
>      > > +}
>      > > +
>      > > +static int
>      > > +tcp_log_dev_read(struct cdev *dev __unused, struct uio *uio, int
>      flags)
>      > > +{
>      > > +   struct tcp_log_common_header *buf;
>      > > +   struct tcp_log_dev_info *priv;
>      > > +   struct tcp_log_dev_queue *entry;
>      > > +   ssize_t len;
>      > > +   int lockstate, rv;
>      > > +
>      > > +   /* Get our private info. */
>      > > +   rv = devfs_get_cdevpriv((void **)&priv);
>      > > +   if (rv)
>      > > +           return (rv);
>      > > +
>      > > +   lockstate = QUEUE_UNLOCKED;
>      > > +
>      > > +   /* Do we need to get a new buffer? */
>      > > +   while (priv->tldi_cur == NULL ||
>      > > +       priv->tldi_cur->tlch_length <= priv->tldi_off) {
>      > > +           /* Did we somehow forget to rotate? */
>      > > +           KASSERT(priv->tldi_cur == NULL,
>      > > +               ("%s:%d: tldi_cur is unexpectedly non-NULL",
>      __func__,
>      > > +               __LINE__));
>      > > +           if (priv->tldi_cur != NULL)
>      > > +                   tcp_log_dev_rotate_bufs(priv,
>      &lockstate);
>      > > +
>      > > +           /*
>      > > +            * Before we start looking at tldi_head, we need a
>      lock on the
>      > > +            * queue to make sure tldi_head stays stable.
>      > > +            */
>      > > +           if (lockstate == QUEUE_UNLOCKED) {
>      > > +                   TCP_LOG_DEV_QUEUE_LOCK();
>      > > +                   lockstate = QUEUE_LOCKED;
>      > > +           }
>      > > +
>      > > +           /* We need the next buffer. Do we have one? */
>      > > +           if (priv->tldi_head == NULL && (flags &
>      FNONBLOCK)) {
>      > > +                   rv = EAGAIN;
>      > > +                   goto done;
>      > > +           }
>      > > +           if (priv->tldi_head == NULL) {
>      > > +                   /* Sleep and wait for more things we
>      can read. */
>      > > +                   rv = mtx_sleep(&tcp_log_dev_listeners,
>      > > +                       &tcp_log_dev_queue_lock, PCATCH,
>      "tcplogdev", 0);
>      > > +                   if (rv)
>      > > +                           goto done;
>      > > +                   if (priv->tldi_head == NULL)
>      > > +                           continue;
>      > > +           }
>      > > +
>      > > +           /*
>      > > +            * We have an entry to read. We want to try to
>      create a
>      > > +            * buffer, if one doesn't already exist.
>      > > +            */
>      > > +           entry = priv->tldi_head;
>      > > +           if (entry->tldq_buf == NULL) {
>      > > +                   TCP_LOG_DEV_QUEUE_LOCK_ASSERT();
>      > > +                   buf = (*entry->tldq_xform)(entry);
>      > > +                   if (buf == NULL) {
>      > > +                           rv = EBUSY;
>      > > +                           goto done;
>      > > +                   }
>      > > +                   entry->tldq_buf = buf;
>      > > +           }
>      > > +
>      > > +           priv->tldi_cur = entry->tldq_buf;
>      > > +           priv->tldi_off = 0;
>      > > +   }
>      > > +
>      > > +   /* Copy what we can from this buffer to the output buffer. */
>      > > +   if (uio->uio_resid > 0) {
>      > > +           /* Drop locks so we can take page faults. */
>      > > +           if (lockstate == QUEUE_LOCKED)
>      > > +                   TCP_LOG_DEV_QUEUE_UNLOCK();
>      > > +           lockstate = QUEUE_UNLOCKED;
>      > > +
>      > > +           KASSERT(priv->tldi_cur != NULL,
>      > > +               ("%s: priv->tldi_cur is unexpectedly NULL",
>      __func__));
>      > > +
>      > > +           /* Copy as much as we can to this uio. */
>      > > +           len = priv->tldi_cur->tlch_length -
>      priv->tldi_off;
>      > > +           if (len > uio->uio_resid)
>      > > +                   len = uio->uio_resid;
>      > > +           rv = uiomove(((uint8_t *)priv->tldi_cur) +
>      priv->tldi_off,
>      > > +               len, uio);
>      > > +           if (rv != 0)
>      > > +                   goto done;
>      > > +           priv->tldi_off += len;
>      > > +#ifdef TCPLOG_DEBUG_COUNTERS
>      > > +           counter_u64_add(tcp_log_que_read, len);
>      > > +#endif
>      > > +   }
>      > > +   /* Are we done with this buffer? If so, find the next one. */
>      > > +   if (priv->tldi_off >= priv->tldi_cur->tlch_length) {
>      > > +           KASSERT(priv->tldi_off ==
>      priv->tldi_cur->tlch_length,
>      > > +               ("%s: offset (%ju) exceeds length (%ju)",
>      __func__,
>      > > +               (uintmax_t)priv->tldi_off,
>      > > +               (uintmax_t)priv->tldi_cur->tlch_length));
>      > > +           tcp_log_dev_rotate_bufs(priv, &lockstate);
>      > > +   }
>      > > +done:
>      > > +   tcp_log_dev_queue_validate_lock(lockstate);
>      > > +   if (lockstate == QUEUE_LOCKED)
>      > > +           TCP_LOG_DEV_QUEUE_UNLOCK();
>      > > +   return (rv);
>      > > +}
>      > > +
>      > > +static int
>      > > +tcp_log_dev_ioctl(struct cdev *dev __unused, u_long cmd, caddr_t
>      data,
>      > > +    int fflag __unused, struct thread *td __unused)
>      > > +{
>      > > +   struct tcp_log_dev_info *priv;
>      > > +   int rv;
>      > > +
>      > > +   /* Get our private info. */
>      > > +   rv = devfs_get_cdevpriv((void **)&priv);
>      > > +   if (rv)
>      > > +           return (rv);
>      > > +
>      > > +   /*
>      > > +    * Set things. Here, we are most concerned about the
>      non-blocking I/O
>      > > +    * flag.
>      > > +    */
>      > > +   rv = 0;
>      > > +   switch (cmd) {
>      > > +   case FIONBIO:
>      > > +           break;
>      > > +   case FIOASYNC:
>      > > +           if (*(int *)data != 0)
>      > > +                   rv = EINVAL;
>      > > +           break;
>      > > +   default:
>      > > +           rv = ENOIOCTL;
>      > > +   }
>      > > +   return (rv);
>      > > +}
>      > > +
>      > > +static int
>      > > +tcp_log_dev_poll(struct cdev *dev __unused, int events, struct
>      thread *td)
>      > > +{
>      > > +   struct tcp_log_dev_info *priv;
>      > > +   int revents;
>      > > +
>      > > +   /*
>      > > +    * Get our private info. If this fails, claim that all events
>      are
>      > > +    * ready. That should prod the user to do something that will
>      > > +    * make the error evident to them.
>      > > +    */
>      > > +   if (devfs_get_cdevpriv((void **)&priv))
>      > > +           return (events);
>      > > +
>      > > +   revents = 0;
>      > > +   if (events & (POLLIN | POLLRDNORM)) {
>      > > +           /*
>      > > +            * We can (probably) read right now if we are
>      partway through
>      > > +            * a buffer or if we are just about to start a
>      buffer.
>      > > +            * Because we are going to read tldi_head, we
>      should acquire
>      > > +            * a read lock on the queue.
>      > > +            */
>      > > +           TCP_LOG_DEV_QUEUE_LOCK();
>      > > +           if ((priv->tldi_head != NULL && priv->tldi_cur ==
>      NULL) ||
>      > > +               (priv->tldi_cur != NULL &&
>      > > +               priv->tldi_off <
>      priv->tldi_cur->tlch_length))
>      > > +                   revents = events & (POLLIN |
>      POLLRDNORM);
>      > > +           else
>      > > +                   selrecord(td, &tcp_log_sel);
>      > > +           TCP_LOG_DEV_QUEUE_UNLOCK();
>      > > +   } else {
>      > > +           /*
>      > > +            * It only makes sense to poll for reading. So,
>      again, prod the
>      > > +            * user to do something that will make the error
>      of their ways
>      > > +            * apparent.
>      > > +            */
>      > > +           revents = events;
>      > > +   }
>      > > +   return (revents);
>      > > +}
>      > > +
>      > > +int
>      > > +tcp_log_dev_add_log(struct tcp_log_dev_queue *entry)
>      > > +{
>      > > +   struct tcp_log_dev_info *priv;
>      > > +   int rv;
>      > > +   bool wakeup_needed;
>      > > +
>      > > +   KASSERT(entry->tldq_buf != NULL || entry->tldq_xform != NULL,
>      > > +       ("%s: Called with both tldq_buf and tldq_xform set to
>      NULL",
>      > > +       __func__));
>      > > +   KASSERT(entry->tldq_dtor != NULL,
>      > > +       ("%s: Called with tldq_dtor set to NULL", __func__));
>      > > +
>      > > +   /* Get a lock on the queue. */
>      > > +   TCP_LOG_DEV_QUEUE_LOCK();
>      > > +
>      > > +   /* If no one is listening, tell the caller to free the
>      resources. */
>      > > +   if (tcp_log_dev_listeners == 0) {
>      > > +           rv = ENXIO;
>      > > +           goto done;
>      > > +   }
>      > > +
>      > > +   /* Add this to the end of the tailq. */
>      > > +   STAILQ_INSERT_TAIL(&tcp_log_dev_queue_head, entry,
>      tldq_queue);
>      > > +
>      > > +   /* Add references for all current listeners. */
>      > > +   refcount_init(&entry->tldq_refcnt, tcp_log_dev_listeners);
>      > > +
>      > > +   /*
>      > > +    * If any listener is currently stuck on NULL, that means they
>      are
>      > > +    * waiting. Point their head to this new entry.
>      > > +    */
>      > > +   wakeup_needed = false;
>      > > +   STAILQ_FOREACH(priv, &tcp_log_dev_reader_head, tldi_list)
>      > > +           if (priv->tldi_head == NULL) {
>      > > +                   priv->tldi_head = entry;
>      > > +                   wakeup_needed = true;
>      > > +           }
>      > > +
>      > > +   if (wakeup_needed) {
>      > > +           selwakeup(&tcp_log_sel);
>      > > +           wakeup(&tcp_log_dev_listeners);
>      > > +   }
>      > > +
>      > > +   rv = 0;
>      > > +
>      > > +done:
>      > > +   TCP_LOG_DEV_QUEUE_LOCK_ASSERT();
>      > > +   TCP_LOG_DEV_QUEUE_UNLOCK();
>      > > +   return (rv);
>      > > +}
>      > > +
>      > > +static int
>      > > +tcp_log_dev_modevent(module_t mod __unused, int type, void *data
>      __unused)
>      > > +{
>      > > +
>      > > +   /* TODO: Support intelligent unloading. */
>      > > +   switch (type) {
>      > > +   case MOD_LOAD:
>      > > +           if (bootverbose)
>      > > +                   printf("tcp_log: tcp_log device\n");
>      > > +           memset(&tcp_log_sel, 0, sizeof(tcp_log_sel));
>      > > +           memset(&tcp_log_dev_queue_lock, 0, sizeof(struct
>      mtx));
>      > > +           mtx_init(&tcp_log_dev_queue_lock, "tcp_log dev",
>      > > +                    "tcp_log device queues", MTX_DEF);
>      > > +           tcp_log_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD,
>      > > +               &tcp_log_cdevsw, 0, NULL, UID_ROOT,
>      GID_WHEEL, 0400,
>      > > +               "tcp_log");
>      > > +           break;
>      > > +   default:
>      > > +           return (EOPNOTSUPP);
>      > > +   }
>      > > +
>      > > +   return (0);
>      > > +}
>      > > +
>      > > +DEV_MODULE(tcp_log_dev, tcp_log_dev_modevent, NULL);
>      > > +MODULE_VERSION(tcp_log_dev, 1);
>      > >
>      > > Added: head/sys/dev/tcp_log/tcp_log_dev.h
>      > >
>      ==============================================================================
>      > > --- /dev/null       00:00:00 1970   (empty, because file is
>      newly added)
>      > > +++ head/sys/dev/tcp_log/tcp_log_dev.h      Thu Mar 22 09:40:08
>      2018        (r331347)
>      > > @@ -0,0 +1,88 @@
>      > > +/*-
>      > > + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
>      > > + *
>      > > + * Copyright (c) 2016
>      > > + * Netflix Inc.  All rights reserved.
>      > > + *
>      > > + * Redistribution and use in source and binary forms, with or
>      without
>      > > + * modification, are permitted provided that the following
>      conditions
>      > > + * are met:
>      > > + * 1. Redistributions of source code must retain the above
>      copyright
>      > > + *    notice, this list of conditions and the following
>      disclaimer.
>      > > + * 2. Redistributions in binary form must reproduce the above
>      copyright
>      > > + *    notice, this list of conditions and the following
>      disclaimer in the
>      > > + *    documentation and/or other materials provided with the
>      distribution.
>      > > + *
>      > > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS
>      IS'' AND
>      > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
>      TO, THE
>      > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>      PARTICULAR PURPOSE
>      > > + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
>      BE LIABLE
>      > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>      CONSEQUENTIAL
>      > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
>      SUBSTITUTE GOODS
>      > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
>      INTERRUPTION)
>      > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
>      CONTRACT, STRICT
>      > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
>      IN ANY WAY
>      > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>      POSSIBILITY OF
>      > > + * SUCH DAMAGE.
>      > > + *
>      > > + * $FreeBSD$
>      > > + */
>      > > +
>      > > +#ifndef __tcp_log_dev_h__
>      > > +#define    __tcp_log_dev_h__
>      > > +
>      > > +/*
>      > > + * This is the common header for data streamed from the log device.
>      All
>      > > + * blocks of data need to start with this header.
>      > > + */
>      > > +struct tcp_log_common_header {
>      > > +   uint32_t        tlch_version;   /* Version is specific
>      to type. */
>      > > +   uint32_t        tlch_type;      /* Type of entry(ies)
>      that follow. */
>      > > +   uint64_t        tlch_length;    /* Total length,
>      including header. */
>      > > +} __packed;
>      > > +
>      > > +#define    TCP_LOG_DEV_TYPE_BBR    1       /* black box
>      recorder */
>      > > +
>      > > +#ifdef _KERNEL
>      > > +/*
>      > > + * This is a queue entry. All queue entries need to start with this
>      structure
>      > > + * so the common code can cast them to this structure; however,
>      other modules
>      > > + * are free to include additional data after this structure.
>      > > + *
>      > > + * The elements are explained here:
>      > > + * tldq_queue: used by the common code to maintain this entry's
>      position in the
>      > > + *     queue.
>      > > + * tldq_buf: should be NULL, or a pointer to a chunk of data. The
>      data must be
>      > > + *     as long as the common header indicates.
>      > > + * tldq_xform: If tldq_buf is NULL, the code will call this to
>      create the
>      > > + *     the tldq_buf object. The function should *not* directly
>      modify tldq_buf,
>      > > + *     but should return the buffer (which must meet the
>      restrictions
>      > > + *     indicated for tldq_buf).
>      > > + * tldq_dtor: This function is called to free the queue entry. If
>      tldq_buf is
>      > > + *     not NULL, the dtor function must free that, too.
>      > > + * tldq_refcnt: used by the common code to indicate how many
>      readers still need
>      > > + *     this data.
>      > > + */
>      > > +struct tcp_log_dev_queue {
>      > > +   STAILQ_ENTRY(tcp_log_dev_queue) tldq_queue;
>      > > +   struct tcp_log_common_header *tldq_buf;
>      > > +   struct tcp_log_common_header *(*tldq_xform)(struct
>      tcp_log_dev_queue *entry);
>      > > +   void    (*tldq_dtor)(struct tcp_log_dev_queue *entry);
>      > > +   volatile u_int tldq_refcnt;
>      > > +};
>      > > +
>      > > +STAILQ_HEAD(log_queueh, tcp_log_dev_queue);
>      > > +
>      > > +struct tcp_log_dev_info {
>      > > +   STAILQ_ENTRY(tcp_log_dev_info) tldi_list;
>      > > +   struct tcp_log_dev_queue *tldi_head;
>      > > +   struct tcp_log_common_header *tldi_cur;
>      > > +   off_t                   tldi_off;
>      > > +};
>      > > +STAILQ_HEAD(log_infoh, tcp_log_dev_info);
>      > > +
>      > > +
>      > > +MALLOC_DECLARE(M_TCPLOGDEV);
>      > > +int tcp_log_dev_add_log(struct tcp_log_dev_queue *entry);
>      > > +#endif /* _KERNEL */
>      > > +#endif /* !__tcp_log_dev_h__ */
>      > >
>      > > Modified: head/sys/kern/subr_witness.c
>      > >
>      ==============================================================================
>      > > --- head/sys/kern/subr_witness.c    Thu Mar 22 08:32:39 2018   
>          (r331346)
>      > > +++ head/sys/kern/subr_witness.c    Thu Mar 22 09:40:08 2018   
>          (r331347)
>      > > @@ -640,6 +640,14 @@ static struct witness_order_list_entry
>      order_lists[] =
>      > >     { "db->db_mtx", &lock_class_sx },
>      > >     { NULL, NULL },
>      > >     /*
>      > > +    * TCP log locks
>      > > +    */
>      > > +   { "TCP ID tree", &lock_class_rw },
>      > > +   { "tcp log id bucket", &lock_class_mtx_sleep },
>      > > +   { "tcpinp", &lock_class_rw },
>      > > +   { "TCP log expireq", &lock_class_mtx_sleep },
>      > > +   { NULL, NULL },
>      > > +   /*
>      > >      * spin locks
>      > >      */
>      > >  #ifdef SMP
>      > >
>      > > Modified: head/sys/netinet/tcp.h
>      > >
>      ==============================================================================
>      > > --- head/sys/netinet/tcp.h  Thu Mar 22 08:32:39 2018       
>      (r331346)
>      > > +++ head/sys/netinet/tcp.h  Thu Mar 22 09:40:08 2018       
>      (r331347)
>      > > @@ -168,6 +168,12 @@ struct tcphdr {
>      > >  #define TCP_NOOPT  8       /* don't use TCP options */
>      > >  #define TCP_MD5SIG 16      /* use MD5 digests (RFC2385) */
>      > >  #define    TCP_INFO        32      /* retrieve tcp_info
>      structure */
>      > > +#define    TCP_LOG         34      /* configure event
>      logging for connection */
>      > > +#define    TCP_LOGBUF      35      /* retrieve event log
>      for connection */
>      > > +#define    TCP_LOGID       36      /* configure log ID to
>      correlate connections */
>      > > +#define    TCP_LOGDUMP     37      /* dump connection log
>      events to device */
>      > > +#define    TCP_LOGDUMPID   38      /* dump events from
>      connections with same ID to
>      > > +                              device */
>      > >  #define    TCP_CONGESTION  64      /* get/set congestion
>      control algorithm */
>      > >  #define    TCP_CCALGOOPT   65      /* get/set cc algorithm
>      specific options */
>      > >  #define    TCP_KEEPINIT    128     /* N, time to establish
>      connection */
>      > > @@ -188,6 +194,9 @@ struct tcphdr {
>      > >  #define    TCPI_OPT_WSCALE         0x04
>      > >  #define    TCPI_OPT_ECN            0x08
>      > >  #define    TCPI_OPT_TOE            0x10
>      > > +
>      > > +/* Maximum length of log ID. */
>      > > +#define TCP_LOG_ID_LEN     64
>      > >
>      > >  /*
>      > >   * The TCP_INFO socket option comes from the Linux 2.6 TCP API,
>      and permits
>      > >
>      > > Modified: head/sys/netinet/tcp_input.c
>      > >
>      ==============================================================================
>      > > --- head/sys/netinet/tcp_input.c    Thu Mar 22 08:32:39 2018   
>          (r331346)
>      > > +++ head/sys/netinet/tcp_input.c    Thu Mar 22 09:40:08 2018   
>          (r331347)
>      > > @@ -102,6 +102,7 @@ __FBSDID("$FreeBSD$");
>      > >  #include <netinet6/nd6.h>
>      > >  #include <netinet/tcp.h>
>      > >  #include <netinet/tcp_fsm.h>
>      > > +#include <netinet/tcp_log_buf.h>
>      > >  #include <netinet/tcp_seq.h>
>      > >  #include <netinet/tcp_timer.h>
>      > >  #include <netinet/tcp_var.h>
>      > > @@ -1592,6 +1593,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr
>      *th, stru
>      > >     /* Save segment, if requested. */
>      > >     tcp_pcap_add(th, m, &(tp->t_inpkts));
>      > >  #endif
>      > > +   TCP_LOG_EVENT(tp, th, &so->so_rcv, &so->so_snd, TCP_LOG_IN, 0,
>      > > +       tlen, NULL, true);
>      > >
>      > >     if ((thflags & TH_SYN) && (thflags & TH_FIN) &&
>      V_drop_synfin) {
>      > >             if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
>      > >
>      > > Added: head/sys/netinet/tcp_log_buf.c
>      > >
>      ==============================================================================
>      > > --- /dev/null       00:00:00 1970   (empty, because file is
>      newly added)
>      > > +++ head/sys/netinet/tcp_log_buf.c  Thu Mar 22 09:40:08 2018   
>          (r331347)
>      > > @@ -0,0 +1,2480 @@
>      > > +/*-
>      > > + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
>      > > + *
>      > > + * Copyright (c) 2016-2018
>      > > + * Netflix Inc.  All rights reserved.
>      > > + *
>      > > + * Redistribution and use in source and binary forms, with or
>      without
>      > > + * modification, are permitted provided that the following
>      conditions
>      > > + * are met:
>      > > + * 1. Redistributions of source code must retain the above
>      copyright
>      > > + *    notice, this list of conditions and the following
>      disclaimer.
>      > > + * 2. Redistributions in binary form must reproduce the above
>      copyright
>      > > + *    notice, this list of conditions and the following
>      disclaimer in the
>      > > + *    documentation and/or other materials provided with the
>      distribution.
>      > > + *
>      > > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS
>      IS'' AND
>      > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
>      TO, THE
>      > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>      PARTICULAR PURPOSE
>      > > + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
>      BE LIABLE
>      > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>      CONSEQUENTIAL
>      > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
>      SUBSTITUTE GOODS
>      > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
>      INTERRUPTION)
>      > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
>      CONTRACT, STRICT
>      > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
>      IN ANY WAY
>      > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
>      POSSIBILITY OF
>      > > + * SUCH DAMAGE.
>      > > + *
>      > > + */
>      > > +
>      > > +#include <sys/cdefs.h>
>      > > +__FBSDID("$FreeBSD$");
>      > > +
>      > > +#include <sys/param.h>
>      > > +#include <sys/kernel.h>
>      > > +#include <sys/lock.h>
>      > > +#include <sys/malloc.h>
>      > > +#include <sys/mutex.h>
>      > > +#include <sys/queue.h>
>      > > +#include <sys/refcount.h>
>      > > +#include <sys/rwlock.h>
>      > > +#include <sys/socket.h>
>      > > +#include <sys/socketvar.h>
>      > > +#include <sys/sysctl.h>
>      > > +#include <sys/tree.h>
>      > > +#include <sys/counter.h>
>      > > +
>      > > +#include <dev/tcp_log/tcp_log_dev.h>
>      > > +
>      > > +#include <net/if.h>
>      > > +#include <net/if_var.h>
>      > > +#include <net/vnet.h>
>      > > +
>      > > +#include <netinet/in.h>
>      > > +#include <netinet/in_pcb.h>
>      > > +#include <netinet/in_var.h>
>      > > +#include <netinet/tcp_var.h>
>      > > +#include <netinet/tcp_log_buf.h>
>      > > +
>      > > +/* Default expiry time */
>      > > +#define    TCP_LOG_EXPIRE_TIME     ((sbintime_t)60 * SBT_1S)
>      > > +
>      > > +/* Max interval at which to run the expiry timer */
>      > > +#define    TCP_LOG_EXPIRE_INTVL    ((sbintime_t)5 * SBT_1S)
>      > > +
>      > > +bool       tcp_log_verbose;
>      > > +static uma_zone_t tcp_log_bucket_zone, tcp_log_node_zone,
>      tcp_log_zone;
>      > > +static int tcp_log_session_limit =
>      TCP_LOG_BUF_DEFAULT_SESSION_LIMIT;
>      > > +static uint32_t    tcp_log_version = TCP_LOG_BUF_VER;
>      > > +RB_HEAD(tcp_log_id_tree, tcp_log_id_bucket);
>      > > +static struct tcp_log_id_tree tcp_log_id_head;
>      > > +static STAILQ_HEAD(, tcp_log_id_node) tcp_log_expireq_head =
>      > > +    STAILQ_HEAD_INITIALIZER(tcp_log_expireq_head);
>      > > +static struct mtx tcp_log_expireq_mtx;
>      > > +static struct callout tcp_log_expireq_callout;
>      > > +static uint64_t tcp_log_auto_ratio = 0;
>      > > +static uint64_t tcp_log_auto_ratio_cur = 0;
>      > > +static uint32_t tcp_log_auto_mode = TCP_LOG_STATE_TAIL;
>      > > +static bool tcp_log_auto_all = false;
>      > > +
>      > > +RB_PROTOTYPE_STATIC(tcp_log_id_tree, tcp_log_id_bucket, tlb_rb,
>      tcp_log_id_cmp)
>      > > +
>      > > +SYSCTL_NODE(_net_inet_tcp, OID_AUTO, bb, CTLFLAG_RW, 0, "TCP Black
>      Box controls");
>      > > +
>      > > +SYSCTL_BOOL(_net_inet_tcp_bb, OID_AUTO, log_verbose, CTLFLAG_RW,
>      &tcp_log_verbose,
>      > > +    0, "Force verbose logging for TCP traces");
>      > > +
>      > > +SYSCTL_INT(_net_inet_tcp_bb, OID_AUTO, log_session_limit,
>      > > +    CTLFLAG_RW, &tcp_log_session_limit, 0,
>      > > +    "Maximum number of events maintained for each TCP session");
>      > > +
>      > > +SYSCTL_UMA_MAX(_net_inet_tcp_bb, OID_AUTO, log_global_limit,
>      CTLFLAG_RW,
>      > > +    &tcp_log_zone, "Maximum number of events maintained for all
>      TCP sessions");
>      > > +
>      > > +SYSCTL_UMA_CUR(_net_inet_tcp_bb, OID_AUTO, log_global_entries,
>      CTLFLAG_RD,
>      > > +    &tcp_log_zone, "Current number of events maintained for all
>      TCP sessions");
>      > > +
>      > > +SYSCTL_UMA_MAX(_net_inet_tcp_bb, OID_AUTO, log_id_limit,
>      CTLFLAG_RW,
>      > > +    &tcp_log_bucket_zone, "Maximum number of log IDs");
>      > > +
>      > > +SYSCTL_UMA_CUR(_net_inet_tcp_bb, OID_AUTO, log_id_entries,
>      CTLFLAG_RD,
>      > > +    &tcp_log_bucket_zone, "Current number of log IDs");
>      > > +
>      > > +SYSCTL_UMA_MAX(_net_inet_tcp_bb, OID_AUTO, log_id_tcpcb_limit,
>      CTLFLAG_RW,
>      > > +    &tcp_log_node_zone, "Maximum number of tcpcbs with log IDs");
>      > > +
>      > > +SYSCTL_UMA_CUR(_net_inet_tcp_bb, OID_AUTO, log_id_tcpcb_entries,
>      CTLFLAG_RD,
>      > > +    &tcp_log_node_zone, "Current number of tcpcbs with log IDs");
>      > > +
>      > > +SYSCTL_U32(_net_inet_tcp_bb, OID_AUTO, log_version, CTLFLAG_RD,
>      &tcp_log_version,
>      > > +    0, "Version of log formats exported");
>      > > +
>      > > +SYSCTL_U64(_net_inet_tcp_bb, OID_AUTO, log_auto_ratio, CTLFLAG_RW,
>      > > +    &tcp_log_auto_ratio, 0, "Do auto capturing for 1 out of N
>      sessions");
>      > > +
>      > > +SYSCTL_U32(_net_inet_tcp_bb, OID_AUTO, log_auto_mode, CTLFLAG_RW,
>      > > +    &tcp_log_auto_mode, TCP_LOG_STATE_HEAD_AUTO,
>      > > +    "Logging mode for auto-selected sessions (default is
>      TCP_LOG_STATE_HEAD_AUTO)");
>      > > +
>      > > +SYSCTL_BOOL(_net_inet_tcp_bb, OID_AUTO, log_auto_all, CTLFLAG_RW,
>      > > +    &tcp_log_auto_all, false,
>      > > +    "Auto-select from all sessions (rather than just those with
>      IDs)");
>      > > +
>      > > +#ifdef TCPLOG_DEBUG_COUNTERS
>      > > +counter_u64_t tcp_log_queued;
>      > > +counter_u64_t tcp_log_que_fail1;
>      > > +counter_u64_t tcp_log_que_fail2;
>      > > +counter_u64_t tcp_log_que_fail3;
>      > > +counter_u64_t tcp_log_que_fail4;
>      > > +counter_u64_t tcp_log_que_fail5;
>      > > +counter_u64_t tcp_log_que_copyout;
>      > > +counter_u64_t tcp_log_que_read;
>      > > +counter_u64_t tcp_log_que_freed;
>      > > +
>      > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, queued, CTLFLAG_RD,
>      > > +    &tcp_log_queued, "Number of entries queued");
>      > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail1, CTLFLAG_RD,
>      > > +    &tcp_log_que_fail1, "Number of entries queued but fail 1");
>      > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail2, CTLFLAG_RD,
>      > > +    &tcp_log_que_fail2, "Number of entries queued but fail 2");
>      > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail3, CTLFLAG_RD,
>      > > +    &tcp_log_que_fail3, "Number of entries queued but fail 3");
>      > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail4, CTLFLAG_RD,
>      > > +    &tcp_log_que_fail4, "Number of entries queued but fail 4");
>      > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail5, CTLFLAG_RD,
>      > > +    &tcp_log_que_fail5, "Number of entries queued but fail 4");
>      > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, copyout, CTLFLAG_RD,
>      > > +    &tcp_log_que_copyout, "Number of entries copied out");
>      > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, read, CTLFLAG_RD,
>      > > +    &tcp_log_que_read, "Number of entries read from the queue");
>      > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, freed, CTLFLAG_RD,
>      > > +    &tcp_log_que_freed, "Number of entries freed after reading");
>      > > +#endif
>      > > +
>      > > +#ifdef INVARIANTS
>      > > +#define    TCPLOG_DEBUG_RINGBUF
>      > > +#endif
>      > > +
>      > > +struct tcp_log_mem
>      > > +{
>      > > +   STAILQ_ENTRY(tcp_log_mem) tlm_queue;
>      > > +   struct tcp_log_buffer   tlm_buf;
>      > > +   struct tcp_log_verbose  tlm_v;
>      > > +#ifdef TCPLOG_DEBUG_RINGBUF
>      > > +   volatile int            tlm_refcnt;
>      > > +#endif
>      > > +};
>      > > +
>      > > +/* 60 bytes for the header, + 16 bytes for padding */
>      > > +static uint8_t     zerobuf[76];
>      > > +
>      > > +/*
>      > > + * Lock order:
>      > > + * 1. TCPID_TREE
>      > > + * 2. TCPID_BUCKET
>      > > + * 3. INP
>      > > + *
>      > > + * Rules:
>      > > + * A. You need a lock on the Tree to add/remove buckets.
>      > > + * B. You need a lock on the bucket to add/remove nodes from the
>      bucket.
>      > > + * C. To change information in a node, you need the INP lock if the
>      tln_closed
>      > > + *    field is false. Otherwise, you need the bucket lock. (Note
>      that the
>      > > + *    tln_closed field can change at any point, so you need to
>      recheck the
>      > > + *    entry after acquiring the INP lock.)
>      > > + * D. To remove a node from the bucket, you must have that entry
>      locked,
>      > > + *    according to the criteria of Rule C. Also, the node must
>      not be on
>      > > + *    the expiry queue.
>      > > + * E. The exception to C is the expiry queue fields, which are
>      locked by
>      > > + *    the TCPLOG_EXPIREQ lock.
>      > > + *
>      > > + * Buckets have a reference count. Each node is a reference.
>      Further,
>      > > + * other callers may add reference counts to keep a bucket from
>      disappearing.
>      > > + * You can add a reference as long as you own a lock sufficient to
>      keep the
>      > > + * bucket from disappearing. For example, a common use is:
>      > > + *   a. Have a locked INP, but need to lock the TCPID_BUCKET.
>      > > + *   b. Add a refcount on the bucket. (Safe because the INP lock
>      prevents
>      > > + *      the TCPID_BUCKET from going away.)
>      > > + *   c. Drop the INP lock.
>      > > + *   d. Acquire a lock on the TCPID_BUCKET.
>      > > + *   e. Acquire a lock on the INP.
>      > > + *   f. Drop the refcount on the bucket.
>      > > + *      (At this point, the bucket may disappear.)
>      > > + *
>      > > + * Expire queue lock:
>      > > + * You can acquire this with either the bucket or INP lock. Don't
>      reverse it.
>      > > + * When the expire code has committed to freeing a node, it resets
>      the expiry
>      > > + * time to SBT_MAX. That is the signal to everyone else that they
>      should
>      > > + * leave that node alone.
>      > > + */
>      > > +static struct rwlock tcp_id_tree_lock;
>      > > +#define    TCPID_TREE_WLOCK()             
>      rw_wlock(&tcp_id_tree_lock)
>      > > +#define    TCPID_TREE_RLOCK()             
>      rw_rlock(&tcp_id_tree_lock)
>      > > +#define    TCPID_TREE_UPGRADE()           
>      rw_try_upgrade(&tcp_id_tree_lock)
>      > > +#define    TCPID_TREE_WUNLOCK()           
>      rw_wunlock(&tcp_id_tree_lock)
>      > > +#define    TCPID_TREE_RUNLOCK()           
>      rw_runlock(&tcp_id_tree_lock)
>      > > +#define    TCPID_TREE_WLOCK_ASSERT()     
>       rw_assert(&tcp_id_tree_lock, RA_WLOCKED)
>      > > +#define    TCPID_TREE_RLOCK_ASSERT()     
>       rw_assert(&tcp_id_tree_lock, RA_RLOCKED)
>      > > +#define    TCPID_TREE_UNLOCK_ASSERT()     
>      rw_assert(&tcp_id_tree_lock, RA_UNLOCKED)
>      > > +
>      > > +#define    TCPID_BUCKET_LOCK_INIT(tlb)   
>       mtx_init(&((tlb)->tlb_mtx), "tcp log id bucket", NULL, MTX_DEF)
>      > > +#define    TCPID_BUCKET_LOCK_DESTROY(tlb) 
>      mtx_destroy(&((tlb)->tlb_mtx))
>      > > +#define    TCPID_BUCKET_LOCK(tlb)         
>      mtx_lock(&((tlb)->tlb_mtx))
>      > > +#define    TCPID_BUCKET_UNLOCK(tlb)       
>      mtx_unlock(&((tlb)->tlb_mtx))
>      > > +#define    TCPID_BUCKET_LOCK_ASSERT(tlb) 
>       mtx_assert(&((tlb)->tlb_mtx), MA_OWNED)
>      > > +#define    TCPID_BUCKET_UNLOCK_ASSERT(tlb)
>      mtx_assert(&((tlb)->tlb_mtx), MA_NOTOWNED)
>      > > +
>      > > +#define    TCPID_BUCKET_REF(tlb)         
>       refcount_acquire(&((tlb)->tlb_refcnt))
>      > > +#define    TCPID_BUCKET_UNREF(tlb)       
>       refcount_release(&((tlb)->tlb_refcnt))
>      > > +
>      > > +#define    TCPLOG_EXPIREQ_LOCK()         
>       mtx_lock(&tcp_log_expireq_mtx)
>      > > +#define    TCPLOG_EXPIREQ_UNLOCK()       
>       mtx_unlock(&tcp_log_expireq_mtx)
>      > > +
>      > > +SLIST_HEAD(tcp_log_id_head, tcp_log_id_node);
>      > > +
>      > > +struct tcp_log_id_bucket
>      > > +{
>      > > +   /*
>      > > +    * tlb_id must be first. This lets us use strcmp on
>      > > +    * (struct tcp_log_id_bucket *) and (char *) interchangeably.
>      > > +    */
>      > > +   char                           
>      tlb_id[TCP_LOG_ID_LEN];
>      > > +   RB_ENTRY(tcp_log_id_bucket)     tlb_rb;
>      > > +   struct tcp_log_id_head          tlb_head;
>      > > +   struct mtx                      tlb_mtx;
>      > > +   volatile u_int                  tlb_refcnt;
>      > > +};
>      > > +
>      > > +struct tcp_log_id_node
>      > > +{
>      > > +   SLIST_ENTRY(tcp_log_id_node) tln_list;
>      > > +   STAILQ_ENTRY(tcp_log_id_node) tln_expireq; /* Locked by the
>      expireq lock */
>      > > +   sbintime_t              tln_expiretime; /* Locked by
>      the expireq lock */
>      > > +
>      > > +   /*
>      > > +    * If INP is NULL, that means the connection has closed. We've
>      > > +    * saved the connection endpoint information and the log
>      entries
>      > > +    * in the tln_ie and tln_entries members. We've also saved a
>      pointer
>      > > +    * to the enclosing bucket here. If INP is not NULL, the
>      information is
>      > > +    * in the PCB and not here.
>      > > +    */
>      > > +   struct inpcb            *tln_inp;
>      > > +   struct tcpcb            *tln_tp;
>      > > +   struct tcp_log_id_bucket *tln_bucket;
>      > > +   struct in_endpoints     tln_ie;
>      > > +   struct tcp_log_stailq   tln_entries;
>      > > +   int                     tln_count;
>      > > +   volatile int            tln_closed;
>      > > +   uint8_t                 tln_af;
>      > > +};
>      > > +
>      > > +enum tree_lock_state {
>      > > +   TREE_UNLOCKED = 0,
>      > > +   TREE_RLOCKED,
>      > > +   TREE_WLOCKED,
>      > > +};
>      > > +
>      > > +/* Do we want to select this session for auto-logging? */
>      > > +static __inline bool
>      > > +tcp_log_selectauto(void)
>      > > +{
>      > > +
>      > > +   /*
>      > >
>      > > *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
>      > >
>      >
> 
> References
> 
>    Visible links
>    1. mailto:ruslan.bukin at cl.cam.ac.uk
>    2. https://svnweb.freebsd.org/changeset/base/331347
>    3. https://reviews.freebsd.org/D11085


More information about the svn-src-all mailing list