svn commit: r266423 - in head/sys: conf dev/i40e modules/i40e

Tue May 20 14:00:38 UTC 2014

On Mon, 19 May 2014, Rui Paulo wrote:

> On 18 May 2014, at 18:41, Julian Elischer <julian at freebsd.org> wrote:
>
>> On 5/19/14, 9:21 AM, Jack F Vogel wrote:
>>> Author: jfv
>>> Date: Mon May 19 01:21:02 2014
>>> New Revision: 266423
>>> URL: http://svnweb.freebsd.org/changeset/base/266423
>>>
>>> Log:
>>>   This is the beta release of the driver for the new
>>>   Intel 40G Ethernet Controller XL710 Family. This is
>>>   the core driver, a VF driver called i40evf, will be
>>>   following soon. Questions or comments to myself or
>>>   my co-developer Eric Joyner. Cheers!
>> love the name..
>
> Aesthetics aside, I think the name should be changed.  Network drivers always used [a-z] for name and [0-9] for unit.  Can you find an example where this is not true?

Also, verbose names break formatting.  E.g., netstat -r has 5 columns
available under Netif for the driver name and device number.  netstat
-i has about the same under Name (possibly 1 or 2 not directly under
Name, but reserved for the Name column).  systat has 3 columns
available, but with a more flexible format that truncates other info.
All driver name+numbers are broken now on freefall:

     2 users    Load  0.07  0.04  0.01                  May 20 11:46

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
         Tot   Share      Tot    Share    Free           in   out     in   out
Act   27100    5768   566228     9656 2578420  count
All  11821k    6964 1075549k    22040          pages
Proc:                                                            Interrupts
   r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        cow     130 total
              25       252    3   35    9   65             zfod        uart0 4
                                                           ozfod    44 cpu0:timer
  0.2%Sys   0.0%Intr  0.0%User  0.0%Nice 99.8%Idle        %ozfod     2 igb0:que 0
|    |    |    |    |    |    |    |    |    |    |       daefr     1 igb0:que 1
                                                           prcfr     1 igb0:que 2
                                            dtbuf          totfr     1 igb0:que 3
Namei     Name-cache   Dir-cache    484467 desvn          react     2 igb0:que 4
    Calls    hits   %    hits   %    309999 numvn          pdwak     1 igb0:que 5
        3       3 100                120414 frevn          pdpgs     1 igb0:que 6
                                                           intrn     1 igb0:que 7
Disks  ada0  ada1  ada2  ada3 pass0 pass1 pass2  20279528 wire        igb0:link
KB/t   0.00  0.00  0.00  0.00  0.00  0.00  0.00    179464 act         ahci0 274
tps       0     0     0     0     0     0     0   1323416 inact    48 cpu1:timer
MB/s   0.00  0.00  0.00  0.00  0.00  0.00  0.00    532604 cache     1 cpu9:timer
%busy     0     0     0     0     0     0     0   2045816 free      2 cpu23:time
                                                   1694240 buf       1 cpu11:time
                                                                     1 cpu22:time

systat is hard-coded for 80 columns (it can handle more than 25 rows but
only uses then to display more interrupts).  igb0 is already 1 too long.
"igb0: que N" is 8 too long.  This destroys all the space meant for
showing the interrupt number.  "que N" is almost as good.

The multitude of igb queues (handling a whole 1 interrupt/second each)
also defeats sysstat's vertical formatting.  Fortunately there are almost
no other peripheral devices that ever interrupted on freefall, (just uart
and ahci0), so their interrupts aren't pushed off the display.  uart and
ahci have even more verbose names than igb.  Only most of the cpuN:timer
interrupts are pushed off.  cpuN:timer is an even more verbose name than
uart and ahci.

top is considerably more broken with 80x25 but is not completely hard-coded
for 80 columns.  -SH output is now almost useless on freefall:

last pid: 23715;  load averages:  0.02,  0.05,  0.01   up 15+06:28:30  12:05:06
436 processes: 25 running, 356 sleeping, 55 waiting
CPU:  0.1% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.9% idle
Mem: 176M Active, 1293M Inact, 19G Wired, 520M Cache, 1655M Buf, 1996M Free
ARC: 15G Total, 7008M MFU, 4317M MRU, 18K Anon, 536M Header, 3240M Other
Swap: 8192M Total, 10M Used, 8182M Free

   PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
    11 root          155 ki31     0K   384K CPU2    2 363.7H 100.00% idle{idle:
    11 root          155 ki31     0K   384K CPU3    3 363.7H 100.00% idle{idle:
    ...

Whenever there are more CPUs than rows, top -SH can't even display all
the idle threads.  Without -H, it doesn't display any interrupts, and
without -S it doesn't display any detail about interrupts.  47 rows
is just enough for all the CPUs and a couple of igb interrupts.

Non-curses top output (top -SH <number> >file) displays any number of rows
in a not so useful way of course.  Now I use it to show other problems:
- It truncates the bad verbose description "idle{idle:" as above
- there is a formatting error for "100.00%" that uses 1 more column than
   allocated and thus steals one from the COMMAND column.  100.00% used to
   be unusual for any process, but is now normal for idle threads
- too much space is wasted for the USERNAME column.  In some previous
   version, even more space was wasted.
- top uses the tty's number of columns for file output.  This is bug for
   bug compatible with ps.  The default for a file should be 80 columns,
   but changeable using something like the COLUMNS environment variable
   (not the tty's attribute).
- COLUMNS does work for top and ps.  ROWS works for systat.
- with COLUMNS=80, igb0 is truncated out of existence in top -SH 100 >file.
   Not igb's fault.
- with COLUMNS=114, igb0 is finally visible in top output (top -SH 100 >file):

    12 root          -92    -     0K   880K WAIT    0   1:03  0.00% intr{irq256:
igb0:que}
    12 root          -92    -     0K   880K WAIT    6   0:47  0.00% intr{irq262:
igb0:que}
    12 root          -92    -     0K   880K WAIT    2   0:34  0.00% intr{irq258:
igb0:que}

Something truncated "que N" to just the useless "que".  It is collateral
with the ABI breakage for tdname.
   (Process names were expanded from length 16 to 19, and the old space
   was reserved for ABI compatibility of struct kinfo_proc.  ki_comm is
   the new name and ki_ocomm was the old name. Interrupt names use length
   19.  The reserved space was abused for tdname.  Later, ki_ocomm was
   renamed to ki_tdname.  The tdname length needs to be at least as large
   as the command name so as to hold copies of the command name, since
   due to compatibility hacks related to the abuse, ki_tdname is used to
   return what should be in ki_comm and (?) vice versa.  Here we see the
   results of truncation from this.  The full interrupt name is
   "irq256:igb0 que N".  This has length 18, so it would fit in ki_ocomm,
   but it is truncated to length 16, giving "irq256:igb0 que" when it is
   returned in ki_tdname.  When showing threads, top and ps just increase
   the mess by truncating ki_comm and ki_tdname and adding braces to waste
   even more space.  ki_comm provides no extra info for interrupts and
   some other threads.  It just says "intr" and doesn't need 19 characters
   for this.  "intr" is repeated as "irq" in ki_tdname.  ki_tdname needs
   more space for details but has less.)

Previous regressions changed interrupt command names from something like
"igb0 que 3 irq256" on i386 to "irq256:igb0 que 3" on all arches (move
irqN: from last to first and add punctuation).  This unimproved truncated
cases since it is better to truncate the irq number than anything else.)

systat -v works around some of these bugs as follows:
- it doesn't use kinfo_proc or ki_tdname.  It fetches the interrupt name
   by a different mechanism, so it is not truncated
- it edits the interrupt name to put the irq number at the end, to remove
   punctuation that would waste space.  Then if truncation would still
   occur, it removes "irq"
Much more complicated editing would be needed to parse ki_comm and ki_tdname
and discard redundant and space-wasting bits.  This is too hard.  The kernel
should be more careful to make the strings directly usable.  Interrupt names
for top and ps could be fixed by expanding ki_tdname (breaking the ABI again
:-() and putting the full interrupt name in it, and putting nothing instead
of "intr" in ki_comm.  Or maybe do any concatenation in the kernel, depending
on a syscall parameter, and never using ki_tdname.

Other wasteful redundancies in ki_comm{ki_tdname}:

% idle{idle: cpuN}
% geom{g_down}

Not too bad, but g_ is an improved spelling of geom.

% intr{swi4: clock}

Lots of these, with the important info truncated in several steps.  Old
versions of top displayed something like "swi4: clock clk:cy+" for the
non-threads case and "swi4: clock clk" for the threads case.  This
was from when kernel threads were separate processes.  Several drivers
are attached to "swi4: clock" (clk, cy, sio at least).  19 characters
in the interrupt name is too short, so the name is truncated.  (This is
a regression from previous version written by me.  That used a string
space so that the length was unlimited but combined length was several
times smaller since most names are short.)  At least it writes "+" to
indicate the truncation.  The "+" is then lost by blind truncation on
copying to ki_tdname, so the truncation is not obvious in top and ps.

systat and vmstat never supported SWIs, so the usual method of seeing
the non-truncated non-edited names doesn't work (it is to use vmstat -i).
systat would run out of vertical space showing SWIs.

% kernel{swapper}
% kernel{zio_null_issue}

"kernel" for kernel threads is almost as useless as "intr".  The full
thread name in ki_tdname says more.

% zfskern{arc_reclaim_thre}

This and other zfskern names are the only examples of good ki_comm/ki_tdname
decompositions.  "zfs*" is not repeated in ki_tdname, so only "kern" in
ki_comm becomes redundant when ki_kdname is appended.

Bruce