kern/128840: page fault under load with igb/LRO
Andrew Gierth
andrew at tao11.riddles.org.uk
Thu Nov 13 07:10:02 PST 2008
>Number: 128840
>Category: kern
>Synopsis: page fault under load with igb/LRO
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Thu Nov 13 15:10:01 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator: Andrew Gierth
>Release: FreeBSD 7.1-PRERELEASE (2008-11-09)
>Organization:
>Environment:
FreeBSD redacted.example.com 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Mon Nov 10 20:41:49 UTC 2008 root@:/usr/obj/usr/src/sys/REDACTED amd64
>Description:
Kernel page fault due to null pointer passed from tcp_lro_flush to ether_input:
(kgdb) where
#0 doadump () at pcpu.h:195
#1 0xffffffff80281888 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:418
#2 0xffffffff80281cec in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3 0xffffffff803c91c3 in trap_fatal (frame=0xc, eva=Variable "eva" is not available.
)
at /usr/src/sys/amd64/amd64/trap.c:764
#4 0xffffffff803c95a4 in trap_pfault (frame=0xfffffffface6f9f0, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:680
#5 0xffffffff803c9efa in trap (frame=0xfffffffface6f9f0)
at /usr/src/sys/amd64/amd64/trap.c:449
#6 0xffffffff803aee3e in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:209
#7 0xffffffff8031eadf in ether_input (ifp=0xffffff00010bb800, m=0x0)
at /usr/src/sys/net/if_ethersubr.c:531
#8 0xffffffff8034779b in tcp_lro_flush (cntl=0xffffff000120a258,
lro=0xffffff00036e4000) at /usr/src/sys/netinet/tcp_lro.c:168
#9 0xffffffff801c4d07 in igb_rxeof (rxr=0xffffff000120a258, count=73)
at /usr/src/sys/dev/e1000/if_igb.c:4018
#10 0xffffffff801c4ffb in igb_handle_rx (context=0xffffff000120a200, pending=Variable "pending" is not available.
)
at /usr/src/sys/dev/e1000/if_igb.c:1337
#11 0xffffffff802b796d in taskqueue_run (queue=0xffffff0002400800)
at /usr/src/sys/kern/subr_taskqueue.c:282
#12 0xffffffff802b7c32 in taskqueue_thread_loop (arg=Variable "arg" is not available.
)
at /usr/src/sys/kern/subr_taskqueue.c:401
#13 0xffffffff8025ec2f in fork_exit (
callout=0xffffffff802b7bc0 <taskqueue_thread_loop>,
arg=0xffffff00011584d8, frame=0xfffffffface6fc80)
at /usr/src/sys/kern/kern_fork.c:804
#14 0xffffffff803af20e in fork_trampoline ()
at /usr/src/sys/amd64/amd64/exception.S:455
[...]
#8 0xffffffff8034779b in tcp_lro_flush (cntl=0xffffff000120a258,
lro=0xffffff00036e4000) at /usr/src/sys/netinet/tcp_lro.c:168
168 (*ifp->if_input)(cntl->ifp, lro->m_head);
(kgdb) print lro
$1 = (struct lro_entry *) 0xffffff00036e4000
(kgdb) print *lro
$2 = {next = {sle_next = 0xffffff00036e3c80}, m_head = 0x0,
m_tail = 0xffffff0003a96900, timestamp = 0, ip = 0xffffff0003aac810,
tsval = 87166632, tsecr = 1844070041, source_ip = 4124597842,
dest_ip = 4107820626, next_seq = 2241419788, ack_seq = 1871884633,
len = 122, data_csum = 53193, window = 22336, source_port = 24564,
dest_port = 14357, append_cnt = 0, mss = 56}
Note that m_head == NULL, hence the crash.
>How-To-Repeat:
I got this from running PostgreSQL's "pgbench" benchmark with 100 concurrent connections from a remote host (over a gigE network). This is a request/response workload with relatively small requests and responses; the crash occurred after several minutes of load. The server side was the one that crashed.
Repeating the same workload (and heavier versions of it) with hw.igb.enable_lro=0 did not produce any crashes.
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list