kern/128840: page fault under load with igb/LRO

Andrew Gierth andrew at tao11.riddles.org.uk
Thu Nov 13 07:10:02 PST 2008


>Number:         128840
>Category:       kern
>Synopsis:       page fault under load with igb/LRO
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Nov 13 15:10:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Andrew Gierth
>Release:        FreeBSD 7.1-PRERELEASE (2008-11-09)
>Organization:
>Environment:
FreeBSD redacted.example.com 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Mon Nov 10 20:41:49 UTC 2008     root@:/usr/obj/usr/src/sys/REDACTED  amd64

>Description:
Kernel page fault due to null pointer passed from tcp_lro_flush to ether_input:

(kgdb) where
#0  doadump () at pcpu.h:195
#1  0xffffffff80281888 in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xffffffff80281cec in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xffffffff803c91c3 in trap_fatal (frame=0xc, eva=Variable "eva" is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:764
#4  0xffffffff803c95a4 in trap_pfault (frame=0xfffffffface6f9f0, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:680
#5  0xffffffff803c9efa in trap (frame=0xfffffffface6f9f0)
    at /usr/src/sys/amd64/amd64/trap.c:449
#6  0xffffffff803aee3e in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:209
#7  0xffffffff8031eadf in ether_input (ifp=0xffffff00010bb800, m=0x0)
    at /usr/src/sys/net/if_ethersubr.c:531
#8  0xffffffff8034779b in tcp_lro_flush (cntl=0xffffff000120a258, 
    lro=0xffffff00036e4000) at /usr/src/sys/netinet/tcp_lro.c:168
#9  0xffffffff801c4d07 in igb_rxeof (rxr=0xffffff000120a258, count=73)
    at /usr/src/sys/dev/e1000/if_igb.c:4018
#10 0xffffffff801c4ffb in igb_handle_rx (context=0xffffff000120a200, pending=Variable "pending" is not available.
)
    at /usr/src/sys/dev/e1000/if_igb.c:1337
#11 0xffffffff802b796d in taskqueue_run (queue=0xffffff0002400800)
    at /usr/src/sys/kern/subr_taskqueue.c:282
#12 0xffffffff802b7c32 in taskqueue_thread_loop (arg=Variable "arg" is not available.
)
    at /usr/src/sys/kern/subr_taskqueue.c:401
#13 0xffffffff8025ec2f in fork_exit (
    callout=0xffffffff802b7bc0 <taskqueue_thread_loop>, 
    arg=0xffffff00011584d8, frame=0xfffffffface6fc80)
    at /usr/src/sys/kern/kern_fork.c:804
#14 0xffffffff803af20e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:455

[...]

#8  0xffffffff8034779b in tcp_lro_flush (cntl=0xffffff000120a258, 
    lro=0xffffff00036e4000) at /usr/src/sys/netinet/tcp_lro.c:168
168             (*ifp->if_input)(cntl->ifp, lro->m_head);
(kgdb) print lro
$1 = (struct lro_entry *) 0xffffff00036e4000
(kgdb) print *lro
$2 = {next = {sle_next = 0xffffff00036e3c80}, m_head = 0x0, 
  m_tail = 0xffffff0003a96900, timestamp = 0, ip = 0xffffff0003aac810, 
  tsval = 87166632, tsecr = 1844070041, source_ip = 4124597842, 
  dest_ip = 4107820626, next_seq = 2241419788, ack_seq = 1871884633, 
  len = 122, data_csum = 53193, window = 22336, source_port = 24564, 
  dest_port = 14357, append_cnt = 0, mss = 56}

Note that m_head == NULL, hence the crash.

>How-To-Repeat:
I got this from running PostgreSQL's "pgbench" benchmark with 100 concurrent connections from a remote host (over a gigE network). This is a request/response workload with relatively small requests and responses; the crash occurred after several minutes of load. The server side was the one that crashed.

Repeating the same workload (and heavier versions of it) with hw.igb.enable_lro=0 did not produce any crashes.

>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list