drm-current-kmod-4.16.g20190424 hangs

Johannes Lundberg johalun0 at gmail.com
Fri Apr 26 15:23:11 UTC 2019


>From https://reviews.freebsd.org/D19845  Can you try the sysctls
suggested there?

    In D19845#428854 <https://reviews.freebsd.org/D19845#428854>,
    @tychon <https://reviews.freebsd.org/p/tychon/> wrote:

        In D19845#428768 <https://reviews.freebsd.org/D19845#428768>,
        @greg_unrelenting.technology
        <https://reviews.freebsd.org/p/greg_unrelenting.technology/> wrote:

        Some more i915 GPU testing (w/o the latest update here): after
        using Firefox (opengl layers, xwayland) for some time, GPU
        resets start happening

        drmn0: Resetting chip for stuck wait on rcs0
        drmn0: Resetting chip for stuck wait on rcs0
        drmn0: Resetting chip for stuck wait on rcs0
        …
        DMAR0: Fault Overflow
        DMAR0: vgapci0: pci:0:2:0 sid 10 fault acc 0 adt 0x0 reason 0x5 addr 2e09000
        DMAR0: Fault Overflow
        DMAR0: vgapci0: pci:0:2:0 sid 10 fault acc 0 adt 0x0 reason 0x5 addr 2e09000

        and eventually the whole system freezes if I don't quit the
        compositor / switch to vt console.

    Looks like a symptom of non-translatable physical address. I've
    encountered drivers which need additional work outside of the scope
    of this effort. Perhaps this is the case there as I can't any more
    cases in the Linux KPI where a physical address is substituted for a
    DMA one.
    Also, I assume this is in remap mode. Does it work in identify map
    mode hw.busdma.default="bounce"? Unless there is an API which
    escaped, if it works in hw.dmar.enable="0" it's not a regression
    from before :-/



On 4/26/19 8:13 AM, Jakob Alvermark wrote:
> Sure:
>
> ---<<BOOT>>---
> Copyright (c) 1992-2019 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>     The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 13.0-CURRENT #194 r346736M: Fri Apr 26 12:26:20 CEST 2019
>     root at flyer:/usr/obj/usr/src/amd64.amd64/sys/FLYER amd64
> FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on
> LLVM 8.0.0)
> VT(efifb): resolution 1366x768
> CPU: Intel(R) Pentium(R) CPU  N3540  @ 2.16GHz (2166.72-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x30678  Family=0x6  Model=0x37 Stepping=8
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>
> Features2=0x41d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,RDRAND>
>
>   AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
>   AMD Features2=0x101<LAHF,Prefetch>
>   Structured Extended Features=0x2282<TSCADJ,SMEP,ERMS,NFPUSG>
>   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
>   TSC: P-state invariant, performance statistics
> real memory  = 8589934592 (8192 MB)
> avail memory = 8120422400 (7744 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: <ACRSYS ACRPRDCT>
> WARNING: L1 data cache covers fewer APIC IDs than a core (0 < 1)
> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> FreeBSD/SMP: 1 package(s) x 4 core(s)
> __stack_chk_init: WARNING: Initializing stack protection with
> non-random cookies!
> __stack_chk_init: WARNING: This severely limits the benefit of
> -fstack-protector!
> ioapic0: Changing APIC ID to 2
> ioapic0 <Version 2.0> irqs 0-86 on motherboard
> Launching APs: 2 3 1
> Timecounter "TSC-low" frequency 1083359641 Hz quality 1000
> Cuse v0.1.36 @ /dev/cuse
> random: entropy device external interface
> kbd1 at kbdmux0
> module_register_init: MOD_LOAD (vesa, 0xffffffff81150570, 0) error 19
> random: registering fast source Intel Secure Key RNG
> random: fast provider: "Intel Secure Key RNG"
> 000.000049 [4254] netmap_init               netmap: loaded module
> [ath_hal] loaded
> nexus0
> efirtc0: <EFI Realtime Clock> on motherboard
> efirtc0: registered as a time-of-day clock, resolution 1.000000s
> cryptosoft0: <software crypto> on motherboard
> acpi0: <ACRSYS ACRPRDCT> on motherboard
> acpi0: Power Button (fixed)
> unknown: I/O range not supported
> cpu0: <ACPI CPU> on acpi0
> atrtc0: <AT realtime clock> port 0x70-0x77 on acpi0
> atrtc0: Warning: Couldn't map I/O.
> atrtc0: registered as a time-of-day clock, resolution 1.000000s
> Event timer "RTC" frequency 32768 Hz quality 0
> hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 8
> on acpi0
> Timecounter "HPET" frequency 14318180 Hz quality 950
> Event timer "HPET" frequency 14318180 Hz quality 450
> Event timer "HPET1" frequency 14318180 Hz quality 440
> Event timer "HPET2" frequency 14318180 Hz quality 440
> attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
> Timecounter "i8254" frequency 1193182 Hz quality 0
> Event timer "i8254" frequency 1193182 Hz quality 100
> Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
> acpi_ec0: <Embedded Controller: GPE 0x18> port 0x62,0x66 on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pcib0: Length mismatch for 3 range: 109fffff vs 10a00000
> pci0: <ACPI PCI bus> on pcib0
> vgapci0: <VGA-compatible display> port 0x2050-0x2057 mem
> 0x90000000-0x903fffff,0x80000000-0x8fffffff at device 2.0 on pci0
> vgapci0: Boot video device
> ahci0: <AHCI SATA controller> port
> 0x2048-0x204f,0x205c-0x205f,0x2040-0x2047,0x2058-0x205b,0x2020-0x203f
> mem 0x9091e000-0x9091e7ff at device 19.0 on pci0
> ahci0: AHCI v1.30 with 2 3Gbps ports, Port Multiplier not supported
> ahcich0: <AHCI channel> at channel 0 on ahci0
> xhci0: <Intel BayTrail USB 3.0 controller> mem 0x90900000-0x9090ffff
> at device 20.0 on pci0
> xhci0: 32 bytes context size, 64-bit DMA
> xhci0: Port routing mask set to 0xffffffff
> usbus0 on xhci0
> usbus0: 5.0Gbps Super Speed USB v3.0
> pci0: <encrypt/decrypt> at device 26.0 (no driver attached)
> hdac0: <Intel BayTrail HDA Controller> mem 0x90910000-0x90913fff at
> device 27.0 on pci0
> pcib1: <ACPI PCI-PCI bridge> at device 28.0 on pci0
> pcib1: [GIANT-LOCKED]
> pcib2: <ACPI PCI-PCI bridge> at device 28.1 on pci0
> pcib2: [GIANT-LOCKED]
> pci1: <ACPI PCI bus> on pcib2
> iwn0: <Intel Centrino Advanced 6235> mem 0x90600000-0x90601fff at
> device 0.0 on pci1
> arc4random: WARNING: initial seeding bypassed the cryptographic random
> device because it was not yet seeded and the knob
> 'bypass_before_seeding' was enabled.
> pcib3: <ACPI PCI-PCI bridge> at device 28.2 on pci0
> pcib3: [GIANT-LOCKED]
> pci2: <ACPI PCI bus> on pcib3
> re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port
> 0x1000-0x10ff mem 0x90500000-0x90500fff,0x90400000-0x90403fff at
> device 0.0 on pci2
> re0: Using 1 MSI-X message
> re0: ASPM disabled
> re0: Chip rev. 0x4c000000
> re0: MAC rev. 0x00000000
> miibus0: <MII bus> on re0
> rgephy0: <RTL8251/8153 1000BASE-T media interface> PHY 1 on miibus0
> rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
> 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX,
> 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master,
> auto, auto-flow
> re0: Using defaults for TSO: 65518/35/2048
> re0: Ethernet address: c4:54:44:d6:95:39
> re0: netmap queues/slots: TX 1/256, RX 1/256
> isab0: <PCI-ISA bridge> at device 31.0 on pci0
> isa0: <ISA bus> on isab0
> acpi_button0: <Power Button> on acpi0
> acpi_button1: <Sleep Button> on acpi0
> gpio0: <Intel Baytrail GPIO Controller> iomem 0xfed0c000-0xfed0cfff
> irq 49 on acpi0
> gpiobus0: <GPIO bus> on gpio0
> gpioc0: <GPIO controller> on gpio0
> gpio1: <Intel Baytrail GPIO Controller> iomem 0xfed0d000-0xfed0dfff
> irq 48 on acpi0
> gpiobus1: <GPIO bus> on gpio1
> gpioc1: <GPIO controller> on gpio1
> gpio2: <Intel Baytrail GPIO Controller> iomem 0xfed0e000-0xfed0efff
> irq 50 on acpi0
> gpiobus2: <GPIO bus> on gpio2
> gpioc2: <GPIO controller> on gpio2
> sdhci_acpi0: <Intel Bay Trail/Braswell eMMC 4.5/4.5.1 Controller>
> iomem 0x90a02000-0x90a02fff irq 44 on acpi0
> mmc0: <MMC/SD bus> on sdhci_acpi0
> sdhci_acpi1: <Intel Bay Trail/Braswell SDXC Controller> iomem
> 0x90a00000-0x90a00fff irq 47 on acpi0
> ig4iic_acpi0: <Designware I2C Controller> iomem 0x90a07000-0x90a07fff
> irq 32 on acpi0
> acpi_acad0: <AC Adapter> on acpi0
> battery0: <ACPI Control Method Battery> on acpi0
> acpi_lid0: <Control Method Lid Switch> on acpi0
> acpi_tz0: <Thermal Zone> on acpi0
> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> psm0: <PS/2 Mouse> irq 12 on atkbdc0
> psm0: [GIANT-LOCKED]
> psm0: model Synaptics Touchpad, device ID 0
> uart0: <16550 or compatible> at port 0x3f8 irq 4 flags 0x10 on isa0
> coretemp0: <CPU On-Die Thermal Sensors> on cpu0
> est0: <Enhanced SpeedStep Frequency Control> on cpu0
> ZFS filesystem version: 5
> ZFS storage pool version: features support (5000)
> Timecounters tick every 1.000 msec
> hdacc0: <Realtek ALC283 HDA CODEC> at cad 0 on hdac0
> hdaa0: <Realtek ALC283 Audio Function Group> at nid 1 on hdacc0
> hdaa0: Coef 0x06 val 0x2104 -> 0x2100
> hdaa0: Coef 0x45 val 0xc429 -> 0xd429
> hdaa0: Coef 0x1b val 0x080b -> 0x0c2b
> hdaa0: Coef 0x32 val 0x4ea3 -> 0x4ea3
> pcm0: <Realtek ALC283 (Analog 2.0+HP/2.0)> at nid 20,33 and 18 on hdaa0
> hdacc1: <Intel (0x2882) HDA CODEC> at cad 2 on hdac0
> hdaa1: <Intel (0x2882) Audio Function Group> at nid 1 on hdacc1
> pcm1: <Intel (0x2882) (HDMI/DP 8ch)> at nid 4 on hdaa1
> ugen0.1: <0x8086 XHCI root HUB> at usbus0
> uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
> ada0: <SAMSUNG MZ7PD128HCFV-000H1 DXM01H0Q> ACS-2 ATA SATA 3.x device
> ada0: Serial Number S1MBNSAFA22012
> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 122104MB (250069680 512 byte sectors)
> ada0: quirks=0x3<4K,NCQ_TRIM_BROKEN>
> mmc0: No compatible cards found on bus
> iicbus0: <Philips I2C bus> on ig4iic_acpi0
> iicsmb0: <SMBus over I2C bridge> on iicbus0
> smbus0: <System Management Bus> on iicsmb0
> Trying to mount root from zfs:flyer2/ROOT/default []...
> Root mount waiting for: usbus0
> uhub0: 7 ports with 7 removable, self powered
> Root mount waiting for: usbus0
> ugen0.2: <vendor 0x05e3 USB2.0 Hub> at usbus0
> uhub1 on uhub0
> uhub1: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/85.37, addr 1>
> on usbus0
> uhub1: 4 ports with 3 removable, self powered
> Root mount waiting for: usbus0
> ugen0.3: <vendor 0x8087 product 0x07da> at usbus0
> Root mount waiting for: usbus0
> ugen0.4: <Cisco-Linksys Compact Wireless-G USB Adapter> at usbus0
> ugen0.5: <SunplusIT INC. HD WebCam> at usbus0
> random: unblocking device.
> drmn0: <drmn> on vgapci0
> vgapci0: child drmn0 requested pci_enable_io
> [drm] Unable to create a private tmpfs mount, hugepage support will be
> disabled(-19).
> Successfully added WC MTRR for [0x80000000-0x8fffffff]: 0;
> [drm] Got stolen memory base 0x7b000000, size 0x4000000
> [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [drm] Driver supports precise vblank timestamp query.
> [drm] Connector VGA-1: get mode from tunables:
> [drm]   - kern.vt.fb.modes.VGA-1
> [drm]   - kern.vt.fb.default_mode
> [drm] Connector DP-1: get mode from tunables:
> [drm]   - kern.vt.fb.modes.DP-1
> [drm]   - kern.vt.fb.default_mode
> [drm] Connector HDMI-A-1: get mode from tunables:
> [drm]   - kern.vt.fb.modes.HDMI-A-1
> [drm]   - kern.vt.fb.default_mode
> [drm] Connector eDP-1: get mode from tunables:
> [drm]   - kern.vt.fb.modes.eDP-1
> [drm]   - kern.vt.fb.default_mode
> [drm] Initialized i915 1.6.0 20171222 for drmn0 on minor 0
> ichwd0: <Intel Bay Trail SoC watchdog timer> on isa0
> VT: Replacing driver "efifb" with new "fb".
> start FB_INFO:
> type=11 height=768 width=1366 depth=32
> cmsize=16 size=4227072
> pbase=0x80000000 vbase=0xfffff80080000000
> name=drmn0 flags=0x0 stride=5504 bpp=32
> cmap[0]=0 cmap[1]=7f0000 cmap[2]=7f00 cmap[3]=c4a000
> end FB_INFO
> drmn0: fb0: inteldrmfb frame buffer device
> wlan0: Ethernet address: 80:00:0b:5a:cd:23
> lo0: link state changed to UP
> iwn0: iwn_read_firmware: ucode rev=0x12a80601
> re0: link state changed to DOWN
> wlan0: link state changed to UP
> ubt0 on uhub1
> ubt0: <vendor 0x8087 product 0x07da, class 224/1, rev 2.00/78.69, addr
> 2> on usbus0
> rum0 on uhub1
> rum0: <Cisco-Linksys Compact Wireless-G USB Adapter, class 0/0, rev
> 2.00/0.01, addr 3> on usbus0
> rum0: MAC/BBP RT2573 (rev 0x2573a), RF RT2528
> WARNING: attempt to domain_add(bluetooth) after domainfinalize()
> WARNING: attempt to domain_add(netgraph) after domainfinalize()
> ubt0: ubt_bulk_read_callback:979: bulk-in transfer failed:
> USB_ERR_STALLED
> wlan1: Ethernet address: 00:18:f8:34:d2:8d
> wlan1: link state changed to UP
> .
> Security policy loaded: MAC/ntpd (mac_ntpd)
> .
> [drm] GPU HANG: ecode 7:0:0x86f2fffd, in Xorg [100491], reason: Hang
> on rcs0, action: reset
> drmn0: Resetting chip after gpu hang
> drmn0: i915_reset_device timed out, cancelling all in-flight rendering.
>
> On 2019-04-26 16:48, Johannes Lundberg wrote:
>> Hi
>>
>> Hmm, this is not good. The only thing I can think of is the dma changes
>> to base linuxkpi...
>>
>> Can you share a dmesg output from boot to crash, or at least to after
>> driver is loaded?
>>
>> Tycho, any ideas?
>>
>>
>> On 4/26/19 5:00 AM, Jakob Alvermark wrote:
>>> Hi,
>>>
>>>
>>> When I upgraded -current to r346730 drm-current-kmod-4.16.g20190323
>>> wouldn't load, "device_attach: drmn0 attach returned 19"
>>>
>>> So I upgraded drm-current-kmod to 4.16.g20190424.
>>>
>>> It loads fine, but shortly after starting Xorg the screen freezes.
>>>
>>> The only way out is pressing the power button, it shuts down cleanly.
>>>
>>> /var/log/messages shows this:
>>>
>>> kernel: [drm] GPU HANG: ecode 7:0:0x60ac6ee9, in Xorg [100385],
>>> reason: Hang on rcs0, action: reset
>>> kernel: drmn0: Resetting chip after gpu hang
>>> syslogd: last message repeated 1 times
>>> kernel: drmn0: i915_reset_device timed out, cancelling all in-flight
>>> rendering.
>>> kernel: .
>>>
>>> Tried once more, same thing happened:
>>>
>>> kernel: [drm] GPU HANG: ecode 7:0:0x86f2fffd, in Xorg [100491],
>>> reason: Hang on rcs0, action: reset
>>> kernel: drmn0: Resetting chip after gpu hang
>>> syslogd: last message repeated 1 times
>>> kernel: drmn0: i915_reset_device timed out, cancelling all in-flight
>>> rendering.
>>> kernel: .
>>>
>>> Reverting back to drm-current-kmod-4.16.g20190323 and -current to
>>> r346593 (yay boot environments!) it is stable.
>>>
>>> This is on a laptop with CPU: Intel(R) Pentium(R) CPU  N3540  @
>>> 2.16GHz (2166.72-MHz K8-class CPU)
>>> Baytrail graphics.
>>>
>>>
>>> Jakob
>>>
>>> _______________________________________________
>>> freebsd-x11 at freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-x11
>>> To unsubscribe, send any mail to "freebsd-x11-unsubscribe at freebsd.org"


More information about the freebsd-x11 mailing list