FYI: various 11.0-CURRENT -r293227 (and older) hangs on arm (rpi2): a description of sorts

Mark Millard markmi at dsl-only.net
Thu Jan 7 19:24:13 UTC 2016


On 2016-Jan-7, at 8:12 AM, Ian Lepore <ian at freebsd.org> wrote:
> 
> On Thu, 2016-01-07 at 02:19 -0800, Mark Millard wrote:
>> I've had various hangs when the rpi2 was busy over longish periods,
>> both debug buildkernel/buildworld builds of the arm and non-debug
>> variants. No log files or console messages produced.
>> 
>> I've not had any analogous issues with powerpc64 (PowerMac G5) or
>> with amd64 (Virtual Box used on Mac OS X).
>> 
>> I've finally discovered that if I have, say, top running on the rpi2
>> serial console that top continues to update its display so long as I
>> leave it alone during the hang. (Otherwise it hangs too.) So I
>> finally have a little window for seeing some of what is happening.
>> 
>> An example top display showed after the hang:
>> 
>> Mem: 764M Active 12M Inact 141M Wired 98M Buf 8k free
>> Swap: 2048M Total 29M Used 2019 Free 1% in use
>> 
>> (Yep: Just 8K free Mem.)
>> 
> 
> That's not a problem.
> 
>> The unusual STATEs for processes seemed to be (for the specific
>> hang):
>> 
>> STATE   COMMANDs
>> pfault  [ld] [ld] /usr/sbin/syslogd
>> vmwait  [ld] [md0] [kernel]
>> wswbuf  [pagedaemon]
>> 
>> Those same 3 states seem to always be involved. Some of the processes
>> vary from one hang to the next: the prior hang had build/genautoma ,
>> /usr/sbin/moused , and /usr/sbin/ntpd instead of 3 [ld]'s.
>> 
>> /usr/sbin/syslogd, [md0], [kernel], and [pagedaemon] and their states
>> do not seem to vary (so far).
>> 
>> 
> 
> Everything is backed up waiting for slow sdcard IO.  You can get an
> amd64 system with many cores and gigabytes of ram into the same state
> with an sdcard (or any other storage device that takes literally
> seconds for any individual IO to complete).  All the available buffers
> get queued up to the one slow device, then you can't do anything that
> requires IO (even launch tools to try to figure out what's going on).
> 
> -- Ian

This is not the (or a) sdcard for the root file system, it is a fast, 400GB+ SSD, USB 3.0 capable (not that rpi2 uses it that way). Note below the "da0" and the size and such (other than /boot/msdos):

ugen0.5: <Other World Computing> at usbus0
umass0: <Other World Computing Envoy Pro, class 0/0, rev 2.10/1.00, addr 5> on usbus0
umass0:  SCSI over Bulk-Only; quirks = 0x0100
umass0:0:0: Attached to scbus0
da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
da0: <ASMT 2105 0> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number XXXXXXXXXXXX
Release APs
da0: 40.000MB/s transfers
da0: 457862MB (937703088 512 byte sectors)
da0: quirks=0x2<NO_6_BYTE>
Trying to mount root from ufs:/dev/ufs/RPI2rootfs [rw,noatime]...
. . .
Starting file system checks:
/dev/ufs/RPI2rootfs: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/RPI2rootfs: clean, 109711666 free (14002 frags, 13712208 blocks, 0.0% fragmentation)
Mounting local file systems:.
. . .

> Filesystem          1M-blocks  Used  Avail Capacity  Mounted on
> /dev/ufs/RPI2rootfs    443473 16791 391203     4%    /
> devfs                       0     0      0   100%    /dev
> /dev/mmcsd0s1              49     7     42    15%    /boot/msdos


In USB 3.0 contexts I have never observed seconds for an IO for these types of SSDs and I use them that way extensively. Nor for USB 2.0 uses, though that is not as common of a context for me. Nor have I had any problems with the type of USB 3.0 capable hub messing up IO.

I use this type of SSD to hold my Virtual Box virtual machine(s) that I run amd64 FreeBSD in on Mac OS X. No problems there. But it is true that I've never directly booted amd64 FreeBSD from one of these SSDs in a non-virtual amd64 context.

Ignoring that for a moment, so this is an acceptable/expected FreeBSD behavior when a "disk" device is slow? Interesting. I've let it sit for hours and the hangup does not clear: it is effectively deadlocked for overall usage. The rpi2 never will be able to buildworld, buildkernel, ports, etc. reliably if this is the sort of behavior that results.

Back to this context: I there a way for me to confirm the queuing of buffers to the SSD? Or at least some detail about its buffer usage? Can I get some information from ddb that would confirm/deny/provide insight?






===
Mark Millard
markmi at dsl-only.net



More information about the freebsd-arm mailing list