Unable to shutdown

Wed Aug 31 07:12:11 UTC 2011

On Tue, Aug 30, 2011 at 11:04:43PM -0700, Kevin Oberman wrote:
> >> On Tue, Aug 30, 2011 at 2:48 PM, Jeremy Chadwick
> >> <freebsd at jdc.parodius.com> wrote:
> > instead use UFS2 and see if the problem disappears? ?This is in no way a
> > permanent solution. ?If this workaround fixes the problem, then I'm
> > inclined to believe msdosfs is to blame. ?There have been a lot of
> > discussion of this driver in the kernel as of late, and the general
> > opinion of it is that it's crummy.
> 
> Actually, for me it is as I will shortly be re-partitioning this into
> a GPT disk without any
> msdosfs partitions. I will give it a try with a UFS partition tomorrow
> and see what
> happens.
> 
> When you say that it is crummy, are you referring to the USB driver,
> the AHCI driver, or
> the msdosfs support? I have long been concerned about the latter due
> to occasional
> unstable behavior that is "fixed" by booting Windows. fsck_msdosfs
> seems to do some
> questionable things, too.

I was referring to msdosfs support in the FreeBSD kernel.  I'm still not
so sure about the USB stack (some things seem to be better now as a
result of the re-write that happened during the 7.x -> 8.x days, but
other things may still be awry); I don't tend to use any USB devices on
FreeBSD.  As for AHCI, I have no complaints at all, although AHCI
shouldn't be involved when it comes to a USB-connected SATA hard disk.

> > And here's another thought: what if the issue is limited, somehow, to
> > just writes? ?Meaning, could the kernel issue a "false" read to the
> > device (for some random LBA, even LBA 0 for all I care) and then proceed
> > with its write/flushing? ?I wonder if that would cause the drive to spin
> > up first. ?That would be a "quirk" in my opinion.
> 
> Interesting idea, but I really doubt that it's an issue with the write
> other than that the
> drive may not leave standby unless the cache is full enough that it flushes.

I'm not sure what you mean by the last part of the sentence, but the
former is something I'm in agreement with.  I doubt adding a "fake read"
prior to issuing writes and flushes during shutdown would make any
difference.  I'm just surprised the writes being made are not causing
the drive to spin up.

> > There's also the possibility the USB stack on FreeBSD is doing something
> > really stupid... man, I don't even want to go down that road. ?Hans
> > should be able to help determine if that's the case, but not using
> > msdosfs as a test would be a good start.
> 
> Yes. I make no claim to understand the USB layer at all, but I do
> understand that
> it is very tricky. Lots of evidence of that in how broken early
> Microsoft USB stacks
> were.

FreeBSD has gone through at least two major versions of a USB stack.
The stack in the 4.x days did not impress me -- I tried working on
Logitech USB camera support, but could not get alternative indexes to
work -- ugen(4) returned bizarre error conditions for things that
absolutely should have worked.  I did contact the stack maintainer, but
I would rather not go into the discussion that ensued as a result.

Said USB stack improved slightly from 4.x to 7.x.  An entire re-write
was performed (what was then called "USB2", not to be confused with the
USB 2.0 protocol) which is what's in use (in RELENG_8) today.  There
have been at least 3 different maintainers of the FreeBSD USB stack, and
all at different times / completely segregated.

I don't want my comments to make anyone think the problem described here
is in the FreeBSD USB stack.  I'm just stating some history for those
wondering about it, especially given the comments about Microsoft's
early USB stacks (particularly during the original Windows 95 days and
some other issues during the Win98 era).  My opinion/experiences are my
own.

The problem is that I don't know how to rule the USB stack out when it
comes to diagnosing the problem you're having.  There is the USB_DEBUG
option in one's kernel config which may or may not provide some
insights, but I imagine it's quite chatty and would justify the need for
serial or firewire console given the amount of console output.

> > So I'm pretty sure the kernel is iterating over whatever cache buffers
> > there are for I/O (I don't know what this is called technically) and
> > issuing WRITE DMA or -EXT and either waiting for a non-error response
> > from the device or issuing it blindly followed by a FLUSH CACHE or -EXT
> > (either once per write or at the very end).
> 
> Again, I really believe that the kernel fully believes that all writes
> are complete,
> at least to the disk cache. At that point the FS structures can be removed and
> the FS is no longer mounted as seen from the perspective of the
> system, this MUST
> be done before the disk cache is flushed and the FS is marked "clean".
> I suspect,
> but don't know for sure, that the last two operations performed are to
> mark the drive
> clean and then do a cache flush. Of possible relevance is that none of the file
> system is marked "clean" during a hung shutdown. All need to be FSCKed although
> nothing ever seems to need fixing by fsck(8).

I understand.  It may be that the unmounting process isn't doing
something that it should be (again this is code/framework which I am not
familiar with).

Regarding lack of clean bits being set on UFS filesystems that happen to
exist on the same machine -- I assume these filesystems are on
completely different disks (e.g. not the USB-attached SATA disk).  If
so, it may be that the kernel is spinning waiting for the USB-attached
SATA disk first, and will eventually flush remaining I/O for the other
disks once it finishes with the USB-attached one.

It sounds like some debugging code needs to be inserted during the
"kernel shutdown" phase to find out what's actually going on, rather
than just printing vnode number counts.  I have not looked at the code,
so there may be some debugging code already there if you boot verbose.
Not sure.

And just to make it clear what I'm talking referring to, re: vnode
number counts:

Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...2 0 1 0 1 0 0 0 done
All buffers synced.

I imagine that in your situation Kevin, the "done" message during the
syncing disks phase is never shown.  But what about the vnode count
numbers?  Are they always non-zero and never drop to 0, or are they
always 0 and just indefinitely repeat with no "done" message?

> > What needs to happen here is that those wanting to participate in this
> > ATA protocol discussion *NEED* to familiarise themselves with the
> > ATA8-ACS specification. ?Please PLEASE **PLEASE** take the time to do
> > this before questioning.
> >
> > http://www.t13.org/Documents/UploadedDocuments/docs2007/D1699r4a-ATA8-ACS.pdf
> >
> > Section 4.18.3 contains a flow-chart diagram that is difficult to
> > understand, so I'll summarise:
> >
> > PM0 state = ACTIVE state -- spun up and ready to handle any I/O of any kind
> >
> > PM1 state = IDLE state -- this does not mean "the drive is sitting there
> > idle doing nothing. ?There is an ATA IDLE command that can be used to
> > tell the drive to go into a "lower-power" state.
> >
> > PM2 state = STANDBY state -- this equates to "camcontrol standby". ?This
> > is what people here are describing as "the drive has spun down". ?Or,
> > well, I sure hope that's what people are describing, because "sleep" is
> > not the same thing as "standby".
> >
> > PM3 state = SLEEP state -- this equates to "camcontrol sleep". ?It's
> > permanent until the entire bus is reset or the physical device is
> > power-cycled (which works varies from device to device).
> >
> > So with those definitions, you can see quite clearly the documentation
> > states what should happen when transitioning from one state to another.
> > Specifically this is the one that matters (PM2 --> PM0 state):
> >
> > Transition PM2:PM0: When a media access is required, the device shall
> > make a transition to the PM0:Active mode.
> >
> > Now as for drives which may be in IDLE mode (I'm not sure if FreeBSD
> > makes use of that mode automatically or not), it's the same thing:
> >
> > Transition PM1:PM0: When a media access is required, the device shall
> > make a transition to the PM0:Active mode.
> >
> > So that answers the question: any I/O (read or write) to the device
> > should spin the drive up. ?If you have an enclosure or an ASIC that is
> > screwing this up (I highly doubt it, and this is not the same problem as
> > what David was describing!), then it's in violation of the ATA protocol.
> 
> Nice description. I understand it, but the standrad does not specify EXACTLY
> what triggers a transition from standby to ready (PM2 to PM0). Only that it is
> something that requires media access. A write does not necessarily require
> media access if you define "media" as the disk platter.

You're correct -- "media access" could mean, literally, "accessing the
platter" OR it could mean "LBA read/write I/O".  Then comes into
question whether or not the drive returning something from its on-board
cache would count as "media access" or not.

T13 should probably clarify on this point, and this is one I do not have
an answer for myself.  I strongly believe "media access" means "LBA
read/write I/O" and regardless if it's data that's in the on-board cache
on the disk or not.  I wonder if this behaviour varies per drive model.

> >> I should also make one oddity completely clear, just in case my
> >> initial report failed to
> >> do so. I have two msdosfs file systems on the disk (along with an encrypted UFS
> >> system which is not normally mounted). I can dismount one file system.
> >> It no longer
> >> shows up as mounted, but the drive DOES NOT SPIN UP. Only when I attempt to
> >> unmount the second FS does that unmount hang. And, since the system is running
> >> normally and the drive is still mounted, I can issue a command to read
> >> from the disk
> >> and it spins up. (I actually use tcsh command completion to do this by typing
> >> "ls /media/MUSIC/Ctrl-D" The terminal window freezes at that point for several
> >> seconds until the disk is spun up and ready and than completes the
> >> operation. Both
> >> disks are then unmounted and the system is clear.
> >>
> >> Does anyone know what the very last operations of unmount are? Things that are
> >> AFTER the system as been removed from all system tables? I'm guessing it is just
> >> to mark the system as clean (single block write) and flush the cache.
> >> I'm guessing
> >> that the write is not going to fill cache to the point of triggering a
> >> spin-up, so the
> >> system THINKS the first drive is unmounted, but something is still not complete.
> >
> > This is really starting to sound like idiocy within the msdosfs driver.
> > That's just my opinion at this point. ?As for what happens during device
> > unmount, I believe it's handled per-device (per-layer) as well as
> > per-filesystem. ?Kirk McKusick might have some insight to this --
> > filesystems aren't something I'm really well-versed in.
> 
> Yes, you are right. I'll find out when I try it out tomorrow. Kirk
> almost certainly does
> know since this is relevant to ANY file system.

Cool, I look forward to Kirk's input -- or anyone's input for that
matter.  The worst time for a system to become "wedged" like this is
during shutdown, because by that point who knows what kernel pieces are
shut off (makes debugging possibly very difficult).

> > Sorry for sounding crass, but I really grow tired of people "blaming
> > hardware" willy-nilly when in my experience most of these wonky problems
> > turn out to be bugs/issues in FreeBSD. ?Anyone who thinks this OS is
> > infallible is smoking some serious crack.
> 
> I really know that the FS is far less than perfect, but the fact that
> the two reports
> of this sort of behavior both involve USB drives from the same manufacturer and
> probably running identical firmware does tend to point to hardware issue. It's
> certainly not proof.

Well we need to figure out what's going on here.  I would love to blame
the ASIC used inside the enclosure for USB-to-SATA conversion, but it's
just as possible that the issue may happen if you take the disk out of
the enclosure and hook it up to a native SATA port and issue "camcontrol
standby adaX", you know?

I believe these enclosures are sealed and not intended to be opened by
consumers else they void warranty, correct?  If I'm wrong, someone
should probably open one up and try the above procedure.

Can you provide me the *exact* model of Seagate enclosure this is?  Name
of the product, model and part number/SKU, everything like that?  I will
be happy to purchase one for myself and stick it on my home FreeBSD box and
experiment.

The only USB-attached SATA drive I have is a Toshiba MK5055GSX (500GB,
2.5", 5400rpm, SATA300), in a Tango Blue USB 2.0 enclosure that's from
Acomdata (the Toshiba enclosure the drive came with was utter crap,
drive would get too warm for my liking).  I exclusively use the USB
port, not the eSATA port.  I can test with that if folks want me to do
so.

My drive:
http://sdd.toshiba.com/main.aspx?Path=StorageSolutions/PCNotebookHardDrives/MKxx55GSXSeries

My enclosure:
http://www.acomdata.com/p-294-tango-blue-usb-20-esata-portable-enclosure.aspx

In the meantime, ruling out the msdosfs driver would be a good start,
assuming someone has the time.  My Toshiba drive, for sake of
comparison, uses NTFS.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |