shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

Wed Jun 19 17:53:15 UTC 2013

On 6/19/2013 22:04, Jeremy Chadwick wrote:
> On Wed, Jun 19, 2013 at 09:15:18PM +0700, Adam Strohl wrote:
>> On 6/19/2013 20:35, Jeremy Chadwick wrote:
>
> I've snipped out portions which aren't relevant at this point in the
> convo.  I'm trying to be terse as much as possible here (honest).
>
> To recap for readers/mailing list:
>
> - Adam seems the same behaviour on systems on bare metal, as well as
>    FreeBSD guests running under VMware ESXi 5.0 hypervisor.  However,
>    as I stated on the list just yesterday about "lock-ups on shutdown",
>    every situation may be different and there is a well-established
>    history of this problem on FreeBSD where each root cause (bugs)
>    were completely different from one another.
>
> - The system we're discussing at this point in the thread is on
>    bare metal -- specifically an Asus P8B-X motherboard, with BIOS
>    version 6103, driven entirely by on-board Intel AHCI (not BIOS-level
>    RAID).
>
> - Adam runs 9.1-RELEASE because of business needs pertaining to
>    freebsd-update and binary updates.  (I ask more about this for
>    benefits of readers below, however -- because this situation comes
>    up a lot and I want to know what real-world admins do)
>

This is all correct.

>>> Thanks.  I was mainly interested in the storage controller being used
>>> (in this case ahci(4)) and the disks being used (notorious ST3000DM001,
>>> known for excessively parking heads).
>>
>> Yeah, was not my first choice but then again ... RAIDZ-2 :)  HD
>> supply chain here (Thailand) is weird considering how many are made
>> here (and can't buy).  Smartd screams about them possibly needing a
>> firmware update (they don't according to Seagate).   Had no issues
>> aside from a failure a month or so again (it's an HD ... it
>> happens).
>
> Absolutely understood -- and FYI, in case you need backup, your thought
> process/conclusion here is spot on (re: "it's a MHDD, failures happen").

Indeed :-D

>
> Irrelevant to your shutdown problem: as for smartmontools bitching about
> the firmware: no vendors disclose what actual changes go into their
> drive firmware updates (vendors if you are reading this: I will have
> your souls...), so I have to read a bunch of end-user forums where
> nobody knows what they're talking about, and then of course find this
> "highly educational" *cough* article from Adaptec:
>
> http://ask.adaptec.com/app/answers/detail/a_id/17241/~/known-issues-with-seagate-barracuda-7200.14-desktop-drives
>

Yeah I agree .. I tried to firmware upgrade them when I was building the 
system but it said they didn't qualify when using the boot ISO.  I just 
checked the site and it says no firmware update available too when using 
their search by serial # tool.   At this point I'm leery about updating 
given that I've got data on it anyway.  I do occasionally (maybe once a 
week or two and they're in the same room as me/my office) hear one parking.

I see nothing wrong in smart though, no dmesg errors and have noticed no 
issues with the array and it bench tests at around 850 MB/sec.  Too bad 
10 Gbit equipment isn't cheaper.

Also when I bought the 6 for this array I got a 7th as a cold spare :P

> The problem here is that there have been *so many* firmware bugs with
> Seagate's drives in the past 2 years or so that it's impossible for me
> to know which fixes what.  You buy what you buy because that's what you
> buy, and that's cool -- but I avoid their stuff like the plague.

Yeah.  I'd prefer WD myself but this place is swimming in "green" and 
now "red" drives.  uhgl.

<< Snipping out the unrelated parts ... >>

> Can you try removing VESA and SC_PIXEL_MODE please?  I know that
> sounds crazy ("what on earth would that have to do with it?"), but
> please try it.  I can explain the justification if need be -- I'm being
> extra paranoid of something that got discovered here on -stable only a
> few days ago.  It's a stretch, but I can see potential relevance.  I can
> provide details/links later.

No change unfortunately.

>
>>>>> 4. Does "sysctl hw.usb.no_shutdown_wait=1" help you?
>>>>
>>>> Weirdly this allowed it to reboot on the first try (without needing
>>>> to be reset), but not the second.
>>>
>>> I'm not surprised.  Pleas re-try with stable/9; Hans has been constantly
>>> working on the USB stack and fixing major bugs.
>>
>> Got it but probably not going to go this route as it means no more
>> binary upgrades.  While I can reboot it, it is the office NAS here
>> and so 'testing out' -STABLE I think probably isn't going to happen.
>
> I understand.  I have a question relating to this below.
>
>>> Place background_fsck="no" in /etc/rc.conf.  If the machine does not
>>> have a clean filesystem on boot-up, you'll know because the system will
>>> immediately begin fsck (in the foreground actively).  You'll recognise
>>> that output if it happens, trust me.
>>
>> Preaching to the choir, we set this on all servers this one somehow
>> did not have it set (I think due to ZFS making it unique and not
>> copying our rc.conf template over properly).
>
> Where should I send my bill for services rendered?  (Totally kidding --
> just had some breakfast so feeling chipper :-) )

Ha!

<<SNIP>>

>
> Two questions:
>
> 1. Does 9.1-RELEASE-p3 have the issue?  We really need to know this.
> If it's specific to -p4 then we can narrow down what the cause is.
> I'd ask for further testing if possible, because it would really
> help the kernel folks if we can narrow this down to a commit.

I rolled back the kernel to p3 and saw no change and AFAIK there were 
only two files changed between p3 and p4 are in the kernel, so that 
would effectively be a complete rollback.

I wish I had more conclusive documentation on times/occurrences, it 
wasn't until later did I start to see the pattern.

>
> 2. This is for me and the benefit of the readers: given that you cannot
> (will not) move to stable/9: what is your plan of action now?  This
> issue for you is very important (I would rank it severe + high pri),
> so now you are in a rut.  What is your next move?

Two things ... talk to a client about testing/rebooting one of their 
servers since it is part of a big pool and reliably shows the issue (and 
does not involve ZFS).   They are affected by this too so I don't think 
it'll be too hard of a sell.

I will also keep an eye out for any VMs I run across that do it and 
bring that up in a separate thread (and if whatever happens with the 
above does not apply).

>
>>> I am now going to ask you for more information:
>>>
>>> 1. "gpart show -p xxx" where xxx is each disk you have in the system
>> {snipped}
>
> Looks fine to me -- and kudos on the proper alignment (since those
> drives are indeed 4KB sector drives).  Also kudos to using GPT with
> these because of gmirror, and mirroring the partitions, solely to
> work around the gmirror metadata vs. GPT backup header problem.

Thanks.  There is a huge lack of good clear documentation on this 
unforutnately.  I wrote a two blog entries about this last year:

http://www.ateamsystems.com/blog/Installing-FreeBSD-9-gmirror-GPT-partitions-raid-1

http://www.ateamsystems.com/blog/FreeBSD-Partition-Alignment-RAID-SSD-4k-Drive

They remain one of the most populate things on our site heh.

>
>>> 2. gmirror list
>> {snipped -- I think this looks okay, but I have only used gmirror
>> {2 or 3 times in my life}
>>
>>> 3. Any/all details of your gmirror setup or other things you can
>>>     think of when you set it up
>>
>> The only thing is that we use GMIRROR on the partition level because
>> we use GPT (which is clear from the gpart output I think).  I
>> gmirror the boot partition only in this case as I use ZFS backed
>> swap and ZFS root for this server.
>
> This is irrelevant to the problem (fairly sure), but: I believe
> ZFS-backed swap has still been given "do not do this" status.

   I've
> never done it, I just read lots and lots and lots of discussions about
> it.  I would have stuck it as a partition one of those drives, honestly.

This server has 32 GiB and really should never swap anyway was my view. 
  It felt silly to do a 6-way GMIRROR of a GiB or two of swap basically.

>
>>> {snipping /etc/fstab and /boot/loader.conf}
>
> Looks good, except loader.conf and use of powerd(8) below (also
> irrelevant to your shutdown problem).
>
>> # ---- Power management enables SpeedStep and TurboBoost
>> #
>> powerd_enable="YES"
>> powerd_flags="-a hiadaptive"
>
> Probably irrelevant to the shutdown problem: TMK powerd(8) has anything
> to do with TurboBoost.  It does have to do with SpeedStep, but you need
> some special variables in /boot/loader.conf for this to work, otherwise
> you're using "ACPI throttling" which is different (and actually, hmm, I
> wonder if that may be upsetting something during shutdown (!)).
>
> What you want are these, which makes use of CPU P-state throttling (i.e.
> EIST), and its very smooth and doesn't use ACPI throttling.  I used
> this on our Intel production Supermicro servers *and* my generic desktop
> motherboards for years with absolute success:
>
> # Enable use of P-state CPU frequency throttling.
> # http://wiki.freebsd.org/TuningPowerConsumption
> #
> hint.p4tcc.0.disabled="1"
> hint.acpi_throttle.0.disabled="1"

VERY COOL.  My research indicated (I wish I saved the mailing list 
threads) that basically TurboBoost doesn't engage without SpeedStep and 
powerd was what did that under FreeBSD.   I've documented clear 
performance gains [with certain systems] 
http://www.ateamsystems.com/blog/Increase-FreeBSD-Performance-With-powerd so 
powerd does something.  Maybe there is an auto mode?  I'd love to think 
the performance gains would stack, too!

I will have to test the above / below!

>
> As for my rc.conf and powerd, this is what I use:
>
> powerd_enable="yes"
> performance_cx_lowest="C2"
> economy_cx_lowest="C2"
>
> You may consider removing the last 2 lines -- in fact, I'm not sure of
> their relevancy outside of laptops.  For frequency limit (i.e. don't go
> below XXXXMHz), there's a loader.conf tunable for that, so I think my
> own rc.conf might need some testing.  Anyway, IT WORKS on my CPUs that
> support at lowest C2 mode (see sysctl cx_supported).

For TurboBoost I know the deeper the sleeping, the better.  So you want 
to use the max for your CPU so it can sleep inactive cores further to 
boost active ones more.

>
>> {snipping other stuff}
>>
>> Yeah.  Originally I had even my UPS (APC) disconnected, the only USB
>> device (via a port -- I realize there might be MB virtual ports) was
>> a Dell KB.
>
> I still go to great lengths to use server boards that offer PS/2.  (They
> started using pure USB for a while but the backlash was major)  USB
> keyboard support on FreeBSD is still a joke (sorry Hans, it is).

Heh.  I miss my Keytronic ... and this MB does have PS/2 ...

>
> The USB layer is just too flaky in general -- still -- for me to trust
> it.  I can't tell you how many times I nearly put my fist through the
> LCD of the console when going into single-user mode only to find that
> the USB keyboard didn't function.  After that nonsense, it was back to
> classic PS/2 for me.
>

I have to agree here with massive issues with USB KVMs as well, but they 
also have issues even with BIOS sometimes so it's hard to tell where the 
blame lies in my eyes.. fortunately everything is KVM over IP now for 
the most part.