From nobody Sun Nov 13 19:20:30 2022 X-Original-To: hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4N9MjJ2Ckkz4dKdB for ; Sun, 13 Nov 2022 19:20:36 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-il1-x136.google.com (mail-il1-x136.google.com [IPv6:2607:f8b0:4864:20::136]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4N9MjG74rRz4649 for ; Sun, 13 Nov 2022 19:20:34 +0000 (UTC) (envelope-from markjdb@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=QWtN6vky; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::136 as permitted sender) smtp.mailfrom=markjdb@gmail.com; dmarc=none Received: by mail-il1-x136.google.com with SMTP id d3so4824264ils.1 for ; Sun, 13 Nov 2022 11:20:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :from:to:cc:subject:date:message-id:reply-to; bh=uV1vjXwLY+zdHEOlKO4J9VE5WVOSTyQwKN0YeAiQZy4=; b=QWtN6vkyrrA+3JxMquIJri6svVp3eflbpSYpMJmwCU9X7uxF5IsrLFrTYL6AAeJHjt t96oF9ciWx6+kh1rqqwYQkMLB5PA28wouV7Pr+ACxI0RjsPHZq5u4aFR6MDYUAjjcpz8 pgHGQ1nh9SupSvQlqd7J5W55T7qi9PFoREpf1qAkYOsDOyLEjTOesUG9kO2GE32jM2Bq CyROQTd/HOZzW7zTSLiatFqgQ16X91+SNUEV5dAX63Tgft+/2ftEa1W6uD7rPqP2AjqD F2/1kVRxejod8zYdrt5rgopJjxAhPO6bZnxy8Cr9kHcBemtT/X9uXOaDvnmnPQAV9dmA ZrbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uV1vjXwLY+zdHEOlKO4J9VE5WVOSTyQwKN0YeAiQZy4=; b=evSir/jSFJmpnyf5Fiawc1FJJrRUSomJUtd4h4kE8PkVcccfQjkTPJzPHX8QKbhXVj vaMCKrDW8ATIsOm6GPFnBmlDw9s9CiOaVzanDjvc6I2y1Uc2Lm4VOaQwYl13/kYg5Cqf qracH4G2FwnW+5loPPtlsj7fAQc1p37pFHBp1NsecoWTSF/Uz1GKS/S7HT0Q32cxsl/3 pcjrgZycR+g7VJrLpNqjwmCb7AdQwXnbECVJaBcDh+UrEckKGa4S1wmwUMu6jsAwag4P ZJI0VrN1ooclhi3LlwF3Uv6oF4ahDGAOqqEllAD7hhmz1JpPmj1lKBpJqwmZBY9mcqCB izzg== X-Gm-Message-State: ANoB5pmNMYlsSC6AsHwhBwuH93/Sy5AmCThEpglzW51Ikx3XGPaNCxPQ S/X0s2pBNk1dSADPnTFD5sYB6HDtyYU= X-Google-Smtp-Source: AA0mqf6dL6rHvPPZ75KGmHy3S2M5xqiatLv5bcY/pXAsLgGrWKpFdL33q2KDu08rTr2sUJd+/ZjfEQ== X-Received: by 2002:a05:6e02:1ba8:b0:302:4675:6223 with SMTP id n8-20020a056e021ba800b0030246756223mr4146088ili.3.1668367234036; Sun, 13 Nov 2022 11:20:34 -0800 (PST) Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237]) by smtp.gmail.com with ESMTPSA id g3-20020a056602150300b006a129b10229sm3174830iow.31.2022.11.13.11.20.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 13 Nov 2022 11:20:33 -0800 (PST) Date: Sun, 13 Nov 2022 14:20:30 -0500 From: Mark Johnston To: Christopher Bowman Cc: hackers@freebsd.org Subject: Re: I could use some help Message-ID: References: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spamd-Result: default: False [0.30 / 15.00]; MID_RHS_NOT_FQDN(0.50)[]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; MIME_GOOD(-0.10)[text/plain]; MLMMJ_DEST(0.00)[hackers@freebsd.org]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::136:from]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCPT_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_TLS_LAST(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; TO_DN_SOME(0.00)[]; DMARC_NA(0.00)[freebsd.org]; PREVIOUSLY_DELIVERED(0.00)[hackers@freebsd.org]; RCVD_COUNT_THREE(0.00)[3]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-Rspamd-Queue-Id: 4N9MjG74rRz4649 X-Spamd-Bar: / X-ThisMailContainsUnwantedMimeParts: N On Wed, Nov 09, 2022 at 10:18:02PM -0800, Christopher Bowman wrote: > OK I’m really confused and I could use some help: > > 13.0 runs fine on my Xilinx Zynq based board (DIgilent Arty Z20). However when I compile 13.1 release it doesn’t boot. The kernel stops during boot as follows: > > Using DTB from loaded file '/boot/dtb/zynq-artyz7.dtb'. > > > Loading DTB overlays: 'artyz7_ssd_overlay.dtb' > > > /boot/dtb/overlays/artyz7_ssd_overlay.dtb size=0x1a1 > > > Kernel entry at 0x17a00200... > > > Kernel args: (null) > > > applying DTB overlay '/boot/dtb/overlays/artyz7_ssd_overlay.dtb' > > > WARNING: Cannot find freebsd,dts-version property, cannot check DTB compliance > ---<>--- > Copyright (c) 1992-2021 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > .The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 13.0-STABLE #22 n248064-ce9c3848ff3: Wed Nov 9 22:04:45 PST 2022 > crb@eclipse.ChrisBowman.com:/usr/obj/usr/src/arm.armv7/sys/ARTYZ7 arm > FreeBSD clang version 11.0.1 (git@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe) > CPU: ARM Cortex-A9 r3p0 (ECO: 0x00000000) > CPU Features: > Multiprocessing, Thumb2, Security, VMSAv7, Coherent Walk > Optional instructions: > UMULL, SMULL, SIMD(ext) > LoUU:2 LoC:2 LoUIS:2 > Cache level 1: > 32KB/32B 4-way data cache WB Read-Alloc Write-Alloc > 32KB/32B 4-way instruction cache Read-Alloc > real memory = 536866816 (511 MB) > avail memory = 515162112 (491 MB) > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > random: unblocking device. > random: entropy device external interface > ofwbus0: > simplebus0: on ofwbus0 > simplebus1: on ofwbus0 > l2cache0: mem 0xf8f02000-0xf8f02fff irq 8 on simplebus1 > l2cache0: cannot allocate IRQ, not using interrupt > l2cache0: Part number: 0x3, release: 0x8 > l2cache0: L2 Cache enabled: 512KB/32B 8 ways > gic0: mem 0xf8f01000-0xf8f01fff,0xf8f00100-0xf8f001ff on simplebus1 > gic0: pn 0x39, arch 0x1, rev 0x2, implementer 0x43b irqs 96 > mp_tmr0: mem 0xf8f00200-0xf8f0021f irq 29 on simplebus1 > Timecounter "MPCore" frequency 50000000 Hz quality 800 > mp_tmr1: mem 0xf8f00600-0xf8f0061f irq 36 on simplebus1 > Event timer "MPCore" frequency 50000000 Hz quality 1000 > cpulist0: on ofwbus0 > cpu0: on cpulist0 > cpu1: on cpulist0 > uart0: mem 0xe0000000-0xe0000fff irq 9 on simplebus1 > uart0: console (-1,n,8,1) > zy7_qspi0: mem 0xe000d000-0xe000dfff irq 13 on simplebus1 > zy7_qspi0: must have ref-clock property > device_attach: zy7_qspi0 attach returned 6 > cgem0: mem 0xe000b000-0xe000bfff irq 15 on simplebus1 > miibus0: on cgem0 > rgephy0: PHY 0 on miibus0 > rgephy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto > rgephy1: PHY 1 on miibus0 > rgephy1: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto > cgem0: Ethernet address: 56:99:3e:50:9a:8e > sdhci_fdt0: mem 0xe0100000-0xe0100fff irq 17 on simplebus1 > sdhci_fdt0: 1 slot(s) allocated > mmc0: on sdhci_fdt0 > zy7_devcfg0: mem 0xf8007000-0xf80070ff irq 28 on simplebus1 > Timecounters tick every 1.000 msec > sdhci_fdt0-slot0: Controller timeout > sdhci_fdt0-slot0: ============== REGISTER DUMP ============== > sdhci_fdt0-slot0: Sys addr: 0x00060000 | Version: 0x00008901 > sdhci_fdt0-slot0: Blk size: 0x00005008 | Blk cnt: 0x00000001 > sdhci_fdt0-slot0: Argument: 0x00000000 | Trn mode: 0x00000013 > sdhci_fdt0-slot0: Present: 0x01ff0202 | Host ctl: 0x00000001 > sdhci_fdt0-slot0: Power: 0x0000000f | Blk gap: 0x00000000 > sdhci_fdt0-slot0: Wake-up: 0x00000000 | Clock: 0x00004007 > sdhci_fdt0-slot0: Timeout: 0x00000006 | Int stat: 0x00000001 > sdhci_fdt0-slot0: Int enab: 0x01ff00fb | Sig enab: 0x01ff00fa > sdhci_fdt0-slot0: AC12 err: 0x00000000 | Host ctl2:0x00000000 > sdhci_fdt0-slot0: Caps: 0x69ec0080 | Caps2: 0x00000000 > sdhci_fdt0-slot0: Max curr: 0x00000001 | ADMA err: 0x00000000 > sdhci_fdt0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 > sdhci_fdt0-slot0: =========================================== > sdhci_fdt0-slot0: Controller timeout > sdhci_fdt0-slot0: ============== REGISTER DUMP ============== > sdhci_fdt0-slot0: Sys addr: 0x00000000 | Version: 0x00008901 > sdhci_fdt0-slot0: Blk size: 0x00005008 | Blk cnt: 0x00000001 > sdhci_fdt0-slot0: Argument: 0xaaaa0000 | Trn mode: 0x00000013 > sdhci_fdt0-slot0: Present: 0x01ff0000 | Host ctl: 0x00000001 > sdhci_fdt0-slot0: Power: 0x0000000f | Blk gap: 0x00000000 > sdhci_fdt0-slot0: Wake-up: 0x00000000 | Clock: 0x00004007 > sdhci_fdt0-slot0: Timeout: 0x00000006 | Int stat: 0x00000001 > sdhci_fdt0-slot0: Int enab: 0x01ff00fb | Sig enab: 0x01ff00fa > sdhci_fdt0-slot0: AC12 err: 0x00000000 | Host ctl2:0x00000000 > sdhci_fdt0-slot0: Caps: 0x69ec0080 | Caps2: 0x00000000 > sdhci_fdt0-slot0: Max curr: 0x00000001 | ADMA err: 0x00000000 > sdhci_fdt0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 > sdhci_fdt0-slot0: =========================================== > > The controller register dumps continue for a while and then I get: > > mmc0: CMD7 failed, RESULT: 1 > Release APs > Trying to mount root from ufs:/dev/mmcsd0s2a [rw]... > mountroot: waiting for device /dev/mmcsd0s2a... > Mounting from ufs:/dev/mmcsd0s2a failed with error 19. > Trying to mount root from ufs:mmcsd0s2a []... > mountroot: waiting for device mmcsd0s2a... > Mounting from ufs:mmcsd0s2a failed with error 19. > > Loader variables: > vfs.root.mountfrom=ufs:/dev/mmcsd0s2a > vfs.root.mountfrom.options=rw > > Manual root filesystem specification: > : [options] > Mount using filesystem > and with the specified (optional) option list. > > eg. ufs:/dev/da0s1a > zfs:zroot/ROOT/default > cd9660:/dev/cd0 ro > (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /) > > ? List valid disk boot devices > . Yield 1 second (for background tasks) > Abort manual input > > mountroot> > > If I replace the kernel with a 13.0 kernel, it boots just fine. > > So I go and do a git bisect between release/13.0.0 and release/13.1.0 > > and I get the following: > > git bisect start '--first-parent' > # status: waiting for both good and bad commits > # good: [ea31abc261ffc01b6ff5671bffb15cf910a07f4b] 13.0: update to RELEASE > git bisect good ea31abc261ffc01b6ff5671bffb15cf910a07f4b > # status: waiting for bad commit, 1 good commit known > # bad: [fc952ac2212b121aa6eefc273f5960ec3e0a466d] Update in preparation of 13.1-RELEASE > git bisect bad fc952ac2212b121aa6eefc273f5960ec3e0a466d > # skip: [4c44dbde5491516eba8725dc51d39c1dcc817472] MFC jail: Handle a parent jail when a child is added to it > git bisect skip 4c44dbde5491516eba8725dc51d39c1dcc817472 > # good: [476f87219f408343846254743c7189076be80c04] wpi: Fix a lock leak in an error path in wpi_run() > git bisect good 476f87219f408343846254743c7189076be80c04 > # bad: [05bf7d68c56830e52dee14dc87c07d6716e8195e] aesni: Fix an out-of-bounds read in AES_GCM_decrypt() > git bisect bad 05bf7d68c56830e52dee14dc87c07d6716e8195e > # good: [014ae00ef6edca2687d618e0bda138086a1e1230] date: Capitalize seconds string in synopses > git bisect good 014ae00ef6edca2687d618e0bda138086a1e1230 > # bad: [08d995ca8f6f1008a10e4bf4d924824c040f842a] swapoff_one(): only check free pages count manually turning swap off > git bisect bad 08d995ca8f6f1008a10e4bf4d924824c040f842a > # bad: [81b6dba1a08b031bdf7463c1704d27ae1e0daa0f] ktls: Fix assertion for TLS 1.0 CBC when using non-zero starting seqno. > git bisect bad 81b6dba1a08b031bdf7463c1704d27ae1e0daa0f > # bad: [67efa8b29930f12dae2bf237fa7c2ce1dafbd6b1] net80211: add a driver-private pointer to struct ieee80211_node > git bisect bad 67efa8b29930f12dae2bf237fa7c2ce1dafbd6b1 > # good: [109330155000bfec215ee39148254d2a0b628798] module(9): Document that evhand can be NULL > git bisect good 109330155000bfec215ee39148254d2a0b628798 > # bad: [4c8e29637456bbbe709425f691f637914658009f] LinuxKPI: add module_pci_driver() and pci_alloc_irq_vectors() > git bisect bad 4c8e29637456bbbe709425f691f637914658009f > # bad: [4a03ae8d17ddf3d3b57ca281000fd98e200b92cc] nfscl: Fix use after free for forced dismount > git bisect bad 4a03ae8d17ddf3d3b57ca281000fd98e200b92cc > # bad: [de957de097857fabb69a59a9ba36276c5e735de5] bhyve: Fix the WITH_BHYVE_SNAPSHOT build > git bisect bad de957de097857fabb69a59a9ba36276c5e735de5 > # bad: [5c2e6d9610f1b3f1d7c5d69b925212a7b1fd9391] hwpmc: initialize arm64 counter/interrupt state > git bisect bad 5c2e6d9610f1b3f1d7c5d69b925212a7b1fd9391 > # bad: [ce9c3848ff369467749f59fd24f8b9f1241e725c] uma: Fix handling of reserves in zone_import() > git bisect bad ce9c3848ff369467749f59fd24f8b9f1241e725c > # good: [d5ebaa6f8f850bb6f6273f01386832efcb295827] uma: Improve M_USE_RESERVE handling in keg_fetch_slab() > git bisect good d5ebaa6f8f850bb6f6273f01386832efcb295827 > # first bad commit: [ce9c3848ff369467749f59fd24f8b9f1241e725c] uma: Fix handling of reserves in zone_import() > > If I do git log ce9c3848ff369467749f59fd24f8b9f1241e725c it does seem that d5ebaa6f8f850bb6f6273f01386832efcb295827 is the previous commit and that it works just fine but ce9c3848ff369467749f59fd24f8b9f1241e725c doesn’t boot. It’s literally the same file system and DTB, the only difference is the kernel installed. > > What’s confusing to me is that looking at ce9c3848ff369467749f59fd24f8b9f1241e725c I can’t see how that commit would result in my kernel hanging while booting. It's indeed a bit hard to see why commit ce9c3848ff369467749f59fd24f8b9f1241e725c might cause an sdhci command timeout. I suspect the commit itself is innocent, but happens to uncover a problem elsewhere by changing the order in which page allocations are distributed to different consumers. At the time of the command timeouts, the kernel is still running on a single CPU so the order in which pages are allocated should be fairly deterministic and thus consistent across multiple boots. One thing you can try is to take a releng/13.1 tree and revert just ce9c3848ff369467749f59fd24f8b9f1241e725c, and/or take releng/13.0 and apply just that commit. It would be useful to know whether either of the resulting kernels can boot successfully. Could you also please post your kernel configuration (ARTYZ7) somewhere, together with a dmesg of a successful boot?