kernel bug in 11.3-STABLE causes frequent crashes
Scott Bennett
bennett at sdf.org
Sat Nov 9 13:56:50 UTC 2019
Eugene,
Thank you very much for the fast reply!
Eugene Grosbein <eugen at grosbein.net> wrote:
> 09.11.2019 19:45, Scott Bennett ?????:
> > The rest of this message was posted a little while ago to the
> > freebsd-questions list by mistake. It was intended for freebsd-stable,
> > so I am posting it here now after posting a brief apology on the other
> > list.
> > I have had to waste a great deal of time lately in recovering my
> > system from crashes due to a kernel bug. At present, my system is
> >
> > FreeBSD hellas 11.3-STABLE FreeBSD 11.3-STABLE #12 r352571: Sat Sep 21 11:39:52 CDT 2019 bennett at hellas:/usr/obj/usr/src/sys/hellas amd64
> >
> > There are actually at least two problems, but this particular one has been
> > causing a large portion of my forced reboots. It usually fails to produce
> > a dump and freezes right after the panic and backtrace messages, as it did
> > earlier tonight, but Wednesday night it did create a dump, which I am
> > keeping in case it should prove helpful in getting the bug identified and
> > solved. I copied the console messages to paper painstakingly by hand.
> > They appear to be identical each time, except, of course, for the messages
> > that a dump is produced when, indeed, it does produce one. I am omitting
> > those fairly standard messages.
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 2; apic id = 02
> > fault virtual address = 0x3b8
> > fault code = supervisor read data, page not present
> > instruction pointer = 0x20:0xffffffff80a4b14c
> > stack pointer = 0x0:0xfffffe012a60ea50
> > frame pointer = 0x0:0xfffffe012a60eae0
> > code segment = base 0x0, limit 0xfffff, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = interrupt enabled, resume, IOPL = 0
> > current process = 28 (flowcleaner)
> > trap number = 12
> > panic: page fault
> > cpuid = 2
> > KDB: stack backtrace:
> > #0 0xffffffff80a94707 at kdb_backtrace+0x67
> > #1 0xffffffff80a4fa2e at vpanic+0x17e
> > #2 0xffffffff80a4f8a3 at panic+0x43
> > #3 0xffffffff80f3a4d0 at trap_pfault+0
> > #4 0xffffffff80f3a519 at trap_pfault+0x49
> > #5 0xffffffff80f39bad at trap+0x29
> > #6 0xffffffff80f19f33 at calltrap+0x8
> > #7 0xffffffff80b3bb8d at flowtable_clean_vnet+0x43d
> > #8 0xffffffff80b3c758 at flowtable_cleaner+0xc8
> > #9 0xffffffff80a12ea2 at fork_exit+0x82
> > #10 0xffffffff80flaf4e at fork_trampoline+0xe
> >
> > The machine is ancient. The CPU is a QX9650 (last group of Core 2
> > Quads) with 8 GB of DDR3 memory.
> > If this can be identified as a known bug and a clue provided to a
> > patch or a safer version to upgrade to, I would be grateful. I am getting
> > very, very tired of these crashes.
> > The other forced reboots I will describe in a separate message, but
> > that problem has existed since the time of 11.2-RELEASE and apparently was
> > never investigated, much less fixed, although people began complaining on
> > this list and possibly -questions within the first few days after the
> > release date.
> > Thanks in advance for any help with this problem!
>
> It seems you have custom kernel with options FLOWTABLE. The code it includes
> is known to be buggy, this options was removed from GENERIC many releases ago.
> Remove it from your kernel configuration, rebuild kernel and you will be fine.
>
Wonderful. I have a comment on that line, saying I added it for 8.x, so I
probably found it in 8.1's GENERIC configuration file when I was preparing to upgrade
from 7.3. It is interesting that it only started hitting me (hard enough to make
me notice it, at least) in 11.3 and maybe a bit earlier in 11.2. Anyway, that will
be easy enough to fix, but will require rolling /usr/src back to the revision I am
running, which is probably also no big deal.
I don't seem to be able to build it at the current source revision because
11-STABLE's buildworld began failing during the libc build two or three weeks ago.
I just tried "svn update /usr/src" again, followed by "make -j6 buildworld", and it
still fails with this ending.
--- libc_pic.a ---
ranlib -D libc_pic.a
--- libc.a ---
ranlib -D libc.a
--- libc.so.7.full ---
cc: error: unable to execute command: posix_spawn failed: Permission denied
cc: error: linker command failed with exit code 1 (use -v to see invocation)
*** [libc.so.7.full] Error code 1
make[4]: stopped in /usr/src/lib/libc
1 error
make[4]: stopped in /usr/src/lib/libc
*** [lib/libc__L] Error code 2
make[3]: stopped in /usr/src
1 error
make[3]: stopped in /usr/src
*** [libraries] Error code 2
make[2]: stopped in /usr/src
1 error
make[2]: stopped in /usr/src
*** [_libraries] Error code 2
make[1]: stopped in /usr/src
1 error
make[1]: stopped in /usr/src
*** [buildworld] Error code 2
make: stopped in /usr/src
1 error
make: stopped in /usr/src
Oh, well. During the intervening weeks, I haven't seen any src updates that
appear to have anything to do with fixing the virtual memory management bug(s)
that is/are the other thing wasting my time. I'll start a separate thread for
that, but first I want to do the rollback and get the buildworld started. Oh,
wait a minute...ah, yes! I also have a snapshot of /usr/obj to the same revision,
so I won't even need the buildworld, only the buildkernel. This should be quite
quick then.
Thanks a bundle for your help.
Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet: bennett at sdf.org *xor* bennett at freeshell.org *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good *
* objection to the introduction of that bane of all free governments *
* -- a standing army." *
* -- Gov. John Hancock, New York Journal, 28 January 1790 *
**********************************************************************
More information about the freebsd-stable
mailing list