System hang on shutdown when running freebsd-update

Bengt Ahlgren bengta at sics.se
Wed Oct 29 09:36:46 UTC 2014


Kevin Oberman <rkoberman at gmail.com> writes:

> On Tue, Oct 28, 2014 at 3:09 PM, Walter Hop <freebsd at spam.lifeforms.nl>
> wrote:
>
>> [Apologies for not replying directly to the thread; I found it at
>> https://lists.freebsd.org/pipermail/freebsd-stable/2014-October/080595.html
>> ]
>>
>> I noticed this same hang after upgrading from 10.0-RELEASE to 10.1-RC3 in
>> a VM running under VMware Fusion, so the problem appears still present.
>>
>> I could only make it happen in the single uptime just after the system was
>> freebsd-updated from FreeBSD 10.0 to 10.1-RC3.
>>
>> Here is a screenshot: http://lf.ms/wait-for-reboot.png
>>
>> It did not make any progress after 2 hours of waiting. When restarting the
>> VM, the disk was dirty.
>>
>> Some interesting facts:
>> - Note "swapoff: /dev/da0p2: Cannot allocate memory" in the screenshot
>> which might pose a clue. I haven’t seen this normally.
>> - FreeBSD does respond to ping while it is busy, so it is not a complete
>> "freeze".
>> - The VM is at 100% CPU while this is going on.
>>
>> I have created a snapshot of the VM in the failed state, so maybe some
>> useful information could be retrieved from it, although I don’t have any
>> experience with kernel debugging over VMware.
>>
>> Cheers,
>> WH
>>
>> --
>> Walter Hop | PGP key: https://lifeforms.nl/pgp
>>
>> I am starting to suspect that some code that is needed to flush a resource
> that is blocking the complete shutdown is no longer available so waiting is
> not going to work. I tried a simple "shutdown now" and waited in single
> user mode for a minute before "reboot". It worked fine.
>
> This is based on guesswork, but seems to fit the symptoms.

Some more guesswork that better fit Walter's symtom than Kevin's...

I have noticed that our server with large amounts of disk (three ZFS
pools with 22x4TB disks) and 128GB RAM, often takes quite some time to
shut down after syncing the disks.  The last time it was in the order of
10 mins, but it has always completed.

It seems to be related to swap.  Swap is on dedicated GPT partitions on
two system disks, and during the 10mins, it first accesses the first of
these disks, then the other.  I know for sure that the second must be
accesses to swap, because this is the only partition currently used on
this disk.

I believe that it had in the order of 6GB pushed out to swap the last
time.  It is running 9.3-REL without Denninger's ZFS patches, so it
tends to push some stuff to swap.

Is there some swap GC going on before shutdown that can take this time?

Bengt


More information about the freebsd-stable mailing list