Bizarre clone attempt failures on Raspberry Pi2...

Fri Jul 15 13:36:34 UTC 2016

On Jul 14, 2016, at 11:36 PM, Karl Denninger <karl at denninger.net> wrote:

> On 7/14/2016 13:27, Karl Denninger wrote:
>> On 7/14/2016 12:55, Ian Lepore wrote:
>> No there wasn't.  It was a blank (brand new) card the first time around;
>> it had a MSDOS filesystem on it (as do all new cards) but *no*
>> BSD-specific geom anything on it.
>>> To reliably create a new layout regardless of what may be present
>>> already on the media, you have two choices:
>>> 
>>> 1 - dd zeroes to the entire device
>>> 2 - use the "no commit" feature of gpart
>> Actually in the case at hand #1 isn't impractical since I really only
>> care about the first 100MB or so being zeroed.  The reason is that my
>> boot block (the MSDOS fs) is ~50Mb and the label is obviously next, so
>> if we zero the first 100MB we're fine.
>> 
>> And in fact that does work.
>>> When you pass no '-f <flags>' to a gpart command, it automatically adds
>>> the "-f C" (commit) flag behind your back.  There is no "don't commit"
>>> flag, so (this is surrealistically crazy...) what you're supposed to do
>>> is pass an invalid flag, which it won't complain about, in order to
>>> prevent it from automatically adding that 'C' flag you didn't even
>>> realize existed.  (This is where *I* curse whoever coded this mess.)
>>> 
>>> When you don't commit, the changes take place in a sort of 'virtual
>>> workspace' and nothing on the physical disk changes until you do a
>>> "gpart commit" (or "gpart undo" to discard the changes).  Making all
>>> this much-less-cool that it's sounding right now, there is no automatic
>>> recursion for commit and undo... if you create a bunch of nested stuff
>>> (a slice, a geom within that slice, parititions within that geom), then
>>> you have to commit all the pending new geoms *in reverse order of how
>>> they were created*.
>>> 
>>> So, using da0 (since it's shorter to type), the sequence goes like:
>>> 
>>> gpart destroy -f x -F da0
>>> gpart create -f x  -s MBR da0
>>> gpart add -f x     -t \!12 -s 64M -a 4M da0
>>> gpart add -f x     -t freebsd -a 4M da0
>>> gpart destroy -f x -F da0s2
>>> gpart create -f x  -s BSD da0s2
>>> gpart add -f x     -t freebsd-ufs da0s2
>>> gpart commit da0s2
>>> gpart commit da0
>>> newfs_msdos /dev/da0s1
>>> newfs -U /dev/da0s2a
>>> 
>>> And that reliably creates a fresh rpi-style layout regardless of what
>>> was on the media before you started.
>> Ok, I will try this, BUT I suspect it's still screwed (blind) because
>> when I zeroed the front of the disk I got a "correct" partition layout
>> but after populating it what I get still hangs after it mounts root in
>> the same place.  The way to prevent the alignment issue from coming up
>> is to specify a "-b" switch on the "add", giving you a block offset. 
>> "-b 64" is sufficient; now if the system tries to "taste" da0s2 it will
>> fail (as it does for the card that is running) but "tasting" da0s2a
>> succeeds.
>>> Now, to address the question of the filesystem existing at da0s2 versus
>>> da0s2a, the difference is alignment.  Making things even more
>>> confusing, alignment (if you don't specify it) sometimes changes based
>>> on the type and brand of usb sdcard reader you're using and the fake
>>> geometry values it reports to the system.  (A USB reader almost always
>>> reports different fake geometry than a native sd slot would on a
>>> machine with non-USB based sd support.)
>> Yes, I understand that; if the alignment matches thus the "a" partition
>> starts at offset zero then you can actually reference that (although
>> length might be wrong) with the base device.  After all, what it really
>> does is look at the blocks to see if the magic number is good, and if so
>> it tries to read and process it.
>> 
>> But this doesn't explain why, after getting a layout that's correct (by
>> writing zeros to the front of the card first, so anything that "might"
>> be there isn't) and copying all the file structure over (which facially
>> not only appears to be correct but the loader finds and loads the
>> kernel, AND the root filesystem mounts!) the system hangs, apparently
>> just before init gets started.
>> 
>> If init can't be found you should get a complaint (been there, done
>> that) on the console but there is no complaint of any sort.
>> 
>> I've gotten through the bad structure issue on the SD card, and am now
>> left with "why does it hang on boot -- with no error or other indication
>> of what the problem is" after the kernel loads *and* the root filesystem
>> mounts?
>> 
> Found it.
> 
> Apparently the current code *requires* the label be set on the msdos
> partition.  If it's not then not only does it not mount (which shouldn't
> matter post-boot as the loader is supposed to pass the dtb file, it is
> specified in the config file without any sort of path prefix, and thus
> once the kernel has loaded it should not matter if the dos partition if
> actually mounted or not) *but* the boot process hangs without any
> indication of why!
> 
> So, you must do newfs_msdos -L MSDOSBOOT -F 16 {device}
> 
> If the "-L" is missing you're hosed; the system facially appears to be
> just fine but while the loader comes up and so does the kernel, it hangs
> without ever proceeding -- and without any sort of error message
> indicating that it is unable to mount something it needs.

You have to do that because the device entry in the stock /etc/fstab is /dev/msdosfs/MSDOSBOOT.  The /dev/msdosfs part indicates it's using ms-dos labels.  In other words, this is just the same sort of failure you were getting when you weren't labelling the UFS partition as "rootfs".  Labelling the file system properly "fixes" the issue, as you would expect.

It's a misnomer to say the code "requires" labels.  It's just that's the way the distribution images are currently set up.  I have an older Pi that predates the current distribution images that just uses /dev/mmcsd0... device names in /etc/fstab.  Both approaches work fine.  You just need to make sure the devices you specify in /etc/fstab will actually exist when it comes time to mount the corresponding file system.

If you stop using labels in your /etc/fstab then you won't have problems when those labels are missing.  If the labels are missing, the /dev/{msdosfs,ufs} devices will not be present and the system will drop to single-user mode because none-late, non-noauto file systems can't be accessed via their device nodes when attempting to mount them.  When that happens and you don't have a serial console enabled then you have problems remediating the situation.

If a file system is not needed to mount as part of booting (as you suggest for /boot/msdos) then you should probably flag it with the "noauto" option in /etc/fstab or remove it from /etc/fstab entirely.

I think the problem you were having is not copying all the required attributes of the file systems in question when cloning your SD cards, given your /etc/fstab setup.  It sounds like you've fixed that, now.

Cheers,

Paul.

> 
> I can clone cards now.
> 
> -- 
> Karl Denninger
> karl at denninger.net <mailto:karl at denninger.net>
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/