Unable to shutdown

Kevin Oberman kob6558 at gmail.com
Tue Aug 30 23:10:16 UTC 2011


On Tue, Aug 30, 2011 at 2:48 PM, Jeremy Chadwick
<freebsd at jdc.parodius.com> wrote:
> On Tue, Aug 30, 2011 at 01:29:02PM -0400, David Magda wrote:
>> On Tue, August 30, 2011 11:50, Kevin Oberman wrote:
>> [...]
>> > The more I look at this, the more it seems to me that it is an issue
>> > with the Seagate drive and not a FreeBSD issue. Probably a bug that is
>> > never triggered on Windows, so is largely unnoticed. I suspect Widows
>> > probably orders the command is a subtly different order.
>> [...]
>>
>> Or not the drive per se, but the USB-to-IDE/SATA chipset.
>>
>> A while back on the OpenSolaris zfs-discuss list there was an issue where
>> USB drives would have corrupt ZFS pools if a drive was yanked without a
>> 'zpool export' being run. Even though ZFS is supposed to always be
>> consistent on-disk (because it's transactional), this wasn't happening.
>>
>> It turned that the chipset had a list of particular SATA commands that it
>> allowed through to the drive, and all others were simply answered with
>> "OK", regardless of what actual actions needed to be taken. One of the
>> SATA commands that was NOT whitelisted was the 'cache flush'
>> command--which ZFS needs to make sure that it's data structures were
>> written in the proper order.
>>
>> Turns out the drive and its firmware were fine and doing things properly,
>> it's just that the necessary commands weren't getting to it because of the
>> USB adaptor's chipsset.
>
> I don't think that advice is applicable in this situation.  Here's why:
>
> Kevin's original description indicates that when the drive (or enclosure
> translation ASIC for that matter) is in standby, when the system is shut
> down, the drive/ASIC never spins back up on I/O (flushing all I/O
> buffers to disk).
>
> If he issues "ls" commands or similar userland-induced I/O to the drive
> prior to shutting the system down, the drive/ASIC spins up normally.
>
> Here's Kevin's original quote:
>
>>> The drive is "green" and spins down when idle.  If an attempt is made
>>> to shutdown the system while the drive is spun down, the system goes
>>> through the usual shutdown including flushing all buffer out to disk,
>>> but when the final disk access to mark the file systems as clean, the
>>> drive never spins up and the system hangs until it is powered down.
>>> I've found no way to avoid this other then to remember to access the
>>> disk and cause it to spin up before shutting down.
>>>
>>> If I attempt to unmount the file systems when the drive is shut down.
>>> the same thing happens, but I can recover as the second file system
>>> is still mounted and an ls(1) to that file system will cause the disk
>>> to spin up and everything is fine.
>
> So the question is what's "unique" about flushing all I/O buffers to
> disk during shutdown compared to issuing standard I/O in userland.  I
> can speculate all day as to what the cause is, but it's highly unlikely
> that the USB-to-SATA controller ASIC is causing the problem.

You are perhaps assuming a bit too much. Since I know that a disk read or write
WILL spin up the drive, I can only assume that the msdosfs is not finding
anything to flush, so is not writing. I see the full "flushing all
buffers" countdown
and it always runs successfully to zero. This, without the drive
spinning up. This
begs at least the question of whether the drive is receiving any writes or
whether the "writes" are simply being cached by the drive to save energy. I
suspect that the drive only spins up when enough of its write cache is filled.

In that case, the "flush cache" might actually be what is issued, but
I can't claim
any certainly about that. I'm not willing to completely clear the
USB-SATA chip as
the culprit.

> Furthermore, Windows doesn't have "special disk/enclosure drivers" for
> such drives, so there's nothing "unique" Windows would be sending across
> the wire, ATA-protocol-wise, that would explain why Windows works and
> FreeBSD doesn't.  At least that's my opinion.

This is not always quite true, but it is true for the general case. (I
know some WD
enclosures do install a custom driver.)
>
> With ATA/SATA, the FLUSH CACHE (0xe7) and -EXT (0xea) (for 48-bit LBAs)
> commands are separate from WRITE DMA (0xca) and -EXT (0x35) (for 48-bit
> LBAs).  Both FLUSH CACHE commands do not take LBAs in their input CDB.
> To "flush buffers to disk" I imagine what the kernel should be doing is
> issuing WRITE commands followed by FLUSH CACHE.  The WRITEs should be
> "waking" the drive up.

Should they? As I pointed out above, that is not necessarily the case.
>
> But wait, there's more.
>
> I want to point out to people that "sleep" and "standby" are two very
> different things (they're separate ATA commands too).  So if you're
> using "camcontrol sleep" you probably should be using "camcontrol
> standby".  The man page is quite clear about the repercussions of the
> former (and in the latter case I can imagine I/O to the drive failing or
> simply timing out given that a bus reset is not performed during
> shutdown TMK).

This is  very interesting point. Note that when this happens, whether
at shutdown
or when unmounting the file system, it hangs forever. There is not timeout.

I should also make one oddity completely clear, just in case my
initial report failed to
do so. I have two msdosfs file systems on the disk (along with an encrypted UFS
system which is not normally mounted). I can dismount one file system.
It no longer
shows up as mounted, but the drive DOES NOT SPIN UP. Only when I attempt to
unmount the second FS does that unmount hang. And, since the system is running
normally and the drive is still mounted, I can issue a command to read
from the disk
and it spins up. (I actually use tcsh command completion to do this by typing
"ls /media/MUSIC/Ctrl-D" The terminal window freezes at that point for several
seconds until the disk is spun up and ready and than completes the
operation. Both
disks are then unmounted and the system is clear.

Does anyone know what the very last operations of unmount are? Things that are
AFTER the system as been removed from all system tables? I'm guessing it is just
to mark the system as clean (single block write) and flush the cache.
I'm guessing
that the write is not going to fill cache to the point of triggering a
spin-up, so the
system THINKS the first drive is unmounted, but something is still not complete.

Thanks, Jeremy, for the suggestions.
-- 
R. Kevin Oberman, Network Engineer - Retired
E-mail: kob6558 at gmail.com


More information about the freebsd-stable mailing list