From nobody Fri May 14 20:48:48 2021 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 23A708593D8 for ; Fri, 14 May 2021 20:48:52 +0000 (UTC) (envelope-from freebsd-rwg@gndrsh.dnsmgr.net) Received: from gndrsh.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Fhgc35Y6Jz3jYR; Fri, 14 May 2021 20:48:51 +0000 (UTC) (envelope-from freebsd-rwg@gndrsh.dnsmgr.net) Received: from gndrsh.dnsmgr.net (localhost [127.0.0.1]) by gndrsh.dnsmgr.net (8.13.3/8.13.3) with ESMTP id 14EKmmcF082796; Fri, 14 May 2021 13:48:48 -0700 (PDT) (envelope-from freebsd-rwg@gndrsh.dnsmgr.net) Received: (from freebsd-rwg@localhost) by gndrsh.dnsmgr.net (8.13.3/8.13.3/Submit) id 14EKmm9K082795; Fri, 14 May 2021 13:48:48 -0700 (PDT) (envelope-from freebsd-rwg) From: "Rodney W. Grimes" Message-Id: <202105142048.14EKmm9K082795@gndrsh.dnsmgr.net> Subject: Re: FYI for aarch64 main [14] running a mid March version: I ended up with [usb{usbus2}] stuck at (near) 100% cpu In-Reply-To: <14C02B9D-7342-41EF-AADD-E466C04E17EA@yahoo.com> To: Mark Millard Date: Fri, 14 May 2021 13:48:48 -0700 (PDT) CC: "Rodney W. Grimes" , freebsd-arm , freebsd-current X-Mailer: ELM [version 2.4ME+ PL121h (25)] List-Id: Porting FreeBSD to ARM processors List-Archive: http://lists.freebsd.org/arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 4Fhgc35Y6Jz3jYR X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] > > On 2021-May-14, at 05:52, Rodney W. Grimes wrote: > > >> Note: The context was using a non-debug main build > >> from mid-2021-Mar. (More details identified > >> later.) > >> > >> The issue happend while attempting a: > >> > >> # zfs send -R zpold@for-copy | zfs recv -Fdv zpnew > >> > >> where the drives involved in the command were: > >> > >> zpold: a USB3 SSD, using /dev/da0p3 > >> zpnew: an 480 GiByte Optane in the PCIe slot, using /dev/nda0p3 > >> > >> with: > >> > >> # gpart show -pl > >> => 40 468862048 da0 GPT (224G) > >> 40 532480 da0p1 4C8GCA72EFI (260M) > >> 532520 2008 - free - (1.0M) > >> 534528 29360128 da0p2 4C8GCA72swp14 (14G) > >> 29894656 4194304 - free - (2.0G) > >> 34088960 33554432 da0p4 4C8GCA72swp16 (16G) > >> 67643392 401217536 da0p3 4C8GCA72zfs (191G) > >> 468860928 1160 - free - (580K) > >> > >> => 40 2000409184 ada0 GPT (954G) > >> 40 409600 ada0p1 (null) (200M) > >> 409640 1740636160 ada0p2 FBSDmacchroot (830G) > >> 1741045800 58720256 ada0p3 FBSDmacchswp0 (28G) > >> 1799766056 176160768 ada0p4 FBSDmacchswp1 (84G) > >> 1975926824 24482400 - free - (12G) > >> > >> => 40 937703008 nda0 GPT (447G) > >> 40 532480 nda0p1 CA72opt0EFI (260M) > >> 532520 2008 - free - (1.0M) > >> 534528 117440512 nda0p2 CA72opt0swp56 (56G) > >> 117975040 16777216 - free - (8.0G) > >> 134752256 134217728 nda0p4 CA72opt0swp64 (64G) > >> 268969984 668731392 nda0p3 CA72opt0zfs (319G) > >> 937701376 1672 - free - (836K) > >> > >> The system running was that on /dev/ada0p2 (FBSDmacchroot, > >> which is UFS instead of ZFS). > >> > >> The [usb{usbus2}] process eventually got stuck-busy, no > >> more I/O: > >> > >> CPU 0: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle > >> CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > >> CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > >> CPU 3: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% idle > >> > >> PID USERNAME PRI NICE SIZE RES STATE C TIME CPU COMMAND > >> 15 root -72 - 0B 262144B CPU0 0 8:51 99.95% [usb{usbus2}] > >> > >> 1295 root -8 0 20108Ki 8092Ki q->bq_ 2 0:04 0.00% zfs recv -Fdv zpnew{receive_writer_thre} > >> 1295 root 48 0 20108Ki 8092Ki piperd 2 0:22 0.00% zfs recv -Fdv zpnew{zfs} > >> 1294 root -8 0 17544Ki 7740Ki q->bq_ 2 0:01 0.00% zfs send -R zpold@for-copy{send_reader_thread} > >> 1294 root -8 0 17544Ki 7740Ki q->bq_ 0 0:00 0.00% zfs send -R zpold@for-copy{send_merge_thread} > >> 1294 root -8 0 17544Ki 7740Ki hdr->b 2 0:00 0.00% zfs send -R zpold@for-copy{send_traverse_threa} > >> 1294 root 52 0 17544Ki 7740Ki range- 3 0:20 0.00% zfs send -R zpold@for-copy{zfs} > >> > >> 1036 root -8 - 0B 1488Ki t->zth 0 0:00 0.00% [zfskern{z_checkpoint_discar}] > >> 1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_livelist_condense}] > >> 1036 root -8 - 0B 1488Ki t->zth 2 0:00 0.00% [zfskern{z_livelist_destroy}] > >> 1036 root -8 - 0B 1488Ki t->zth 1 0:00 0.00% [zfskern{z_indirect_condense}] > >> 1036 root -8 - 0B 1488Ki mmp->m 3 0:00 0.00% [zfskern{mmp_thread_enter}] > >> 1036 root -8 - 0B 1488Ki tx->tx 1 0:00 0.00% [zfskern{txg_thread_enter}] > >> 1036 root -8 - 0B 1488Ki tx->tx 2 0:00 0.00% [zfskern{txg_thread_enter}] > >> > >> I was unable to ^c or ^z the process where I > >> typed the command. I eventually stopped the > >> system with "shutdown -p now" from a ssh > >> session (that had already been in place). > > > > Should this occur again before doing the shutdown run a > > zpool status & > > I have gotten in this state when the recv pool was a usb device > > The USB device had the send pool in my example. > > > and for some reason it had a timeout and gone offline. > > No messages about timeouts or other such were made. > > > The clue > > this occured are in dmesg, and zpool status. > > No console, dmesg -a, or /var/log/messages output were > generated. (And the system was running from a SATA SSD > that was operating well.) > > For reference, the USB EtherNet device that was in a > USB2 port continued to operate just fine, allowing the > use of existing ssh sessions that were displaying > gstat -spod and top -Samio -ototal until I started > looking at the problem. (I did not try making a new > ssh session). > > I did not do the "zpool status" so I can not report > about it. > > > Unplug/plug the USB device, check dmesg that it came online, > > and do a zpool clear. > > Okay. I normally avoid unplugging USB storage media > if the system overall does not hang up: hopes of > a clean shutdown leaving things better. Do the zpool status, and only if that indicates a device offline or other problem would you proceed to do the unplug/plug, as at that point zfs has stopped doing anything to the device and your shutdown wont do anything as far as zfs for that pool anyway. > > The system did appear to shutdown to completion. Yes, you can shutdown a system with a zpool in failed state. > > >> > >> When I retried after rebooting and scrubbing (no > >> problems found), the problem did not repeat. > >> > >> I do not have more information nor a way to repeat > >> the problem on demand, unfortunately. > >> > >> Details of the vintage of the system software and > >> such: > >> > >> # ~/fbsd-based-on-what-freebsd-main.sh > >> FreeBSD FBSDmacch 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n245445-def0058cc690 GENERIC-NODBG arm64 aarch64 1400005 1400005 > >> def0058cc690 (HEAD -> mm-src) mm-src snapshot for mm's patched build in git context. > >> merge-base: 7381bbee29df959e88ec59866cf2878263e7f3b2 > >> merge-base: CommitDate: 2021-03-12 20:29:42 +0000 > >> 7381bbee29df (freebsd/main, freebsd/HEAD, pure-src, main) cam: Run all XPT_ASYNC ccbs in a dedicated thread > >> n245444 (--first-parent --count for merge-base) > >> > >> The system was a MACCHIATObin Double Shot. > > > > > > === > Mark Millard > marklmi at yahoo.com > ( dsl-only.net went > away in early 2018-Mar) > > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > -- Rod Grimes rgrimes@freebsd.org