UFS Crash and directories now missing

Alejandro Imass ait at p2ee.org
Thu May 3 17:14:53 UTC 2012

On Thu, May 3, 2012 at 9:35 AM, Robert Bonomi <bonomi at mail.r-bonomi.com> wrote:
> Alejandro Imass <ait at p2ee.org> wrote:
> [ megasnip ]
>> > Things to investigate :
>> > - When was the last time this box was rebooted normally ? Did it went fine ?
>> After I moved the jails to the right place I archived the jails with
>> ezjail-admin and rebooted the server several times, and everything
>> worked as expected.
> Rephrasing -- when was the last time _before_the_problem_was_discovered_
> that the machine was re-booted?

The jails moved Friday 27th so the last reboot before that was Apr 4
and before Feb 29

Feb 29 10:18:46 nune reboot: rebooted by aimass
Apr  4 19:45:03 nune reboot: rebooted by aimass
Apr 27 19:47:06 nune reboot: rebooted by aimass
Apr 28 02:03:57 nune reboot: rebooted by aimass

>> > Were the jails created at this time ?
>> No. Most of these jails have been operational for over a year on this
>> server without any incidents.
> Clarifying the question -- were the jails created at the time of the last
> _prior_ reboot?  i.e., had the machine been re-booted successfully _after_
> the jails were installed, or was this the _first_ such reboot?

No not at all. Most of these jails were created last year, but here is
the detail. cmm_php52_1 is the problematic jail with the MySQL, you
will see a recent date in the config file because I recently added
some cpuset as a band-aid to limit the jail's ability to bring down
the whole system, leaving at least a couple of CPUs free to be able to
ssh and shut it down. There is however a new jail corcaribe_php53 and
was the reason we rebboted the server on Apr 4th, just to make sure
that eveything would boot OK after reboot.

-rw-r--r--  1 root  wheel   917 Feb 16  2011 cat58base
-rw-r--r--  1 root  wheel   917 Apr 29  2011 cm_idvida
-rw-r--r--  1 root  wheel   937 Apr  3  2011 cm_website
-rw-r--r--  1 root  wheel   960 May  2 09:48 cmm_php52_1
-rw-r--r--  1 root  wheel  1037 Apr  4 20:00 corcaribe_php53
-rw-r--r--  1 root  wheel   950 Feb 16  2011 http_proxy
-rw-r--r--  1 root  wheel   917 Aug  3  2011 mcs_cat58
-rw-r--r--  1 root  wheel   917 Feb 10  2011 php52base
-rw-r--r--  1 root  wheel   917 Feb 12  2011 php53base
-rw-r--r--  1 root  wheel   877 Dec 27 20:33 pyugmao
-rw-r--r--  1 root  wheel   877 Mar 21 22:03 testbed
-rw-r--r--  1 root  wheel  1017 May 13  2011 yabarana_cat58
-rw-r--r--  1 root  wheel  1017 Feb 13  2011 yabarana_php52
-rw-r--r--  1 root  wheel  1017 Feb 13  2011 yabarana_php53

> It appears you misunderstood the 'at this time' reference -- it did ot
> mean 'at the time of the incident', but  'at the time of the last prior
> reboot'.  If English is not your primary language, it is an understandable
> misread.
>> As I told you earlier, this server has been running for over a year
>> and we have rebooted many times.
> I don't believe you ever mentioed that particular point (multiple
> successful reboots after istallation) before.  Repeating a prior
> question, _how_long_ before the problem showed up was the most recent
> re-boot?  (Doesn't have to be exact -- an 'order of magnitude' estimate
> [a day, a week, a month, several months] is sufficient.)

Apr 4th

>>                                  If there are such problems they exist
>> by using the EzJail commands and I find this unlikely.
> What you 'find unlikely' is irrelevant.  The entire situation is 'unlikely',
> yet it happened.  So one -has- to look at unlikely things.  <wry grin>


>> here is the mount output is that's of any help:
> [ first disk, and 'fdescfs', and 'procfs' references removed, for clarity ]
>> /dev/ad6s1.journal on /usr/jails (ufs, asynchronous, local, gjournal)
>> /usr/jails/basejail on /usr/jails/yabarana-php53/basejail (nullfs,

> Yes, that is a good start at useful detail.  It is, presumably, _after_
> the problem, and _after_ you had restored things to their proper places.


> Is it safe to  assume that you do -not- have such a 'mount' output from
> some time 'before' the problem?  ( There's no rational reason why you
> -would- have such, but _if_ it existed, and there were any differences
> between 'then' and 'now', it could be very informative.)

No, but from what I remember it's mostly very similar. I can pull off
similar mount statement from other server(s) where we run similar
set-ups and that have never failed if needed.

> Aother critical piece of information is what diretories -- by full path
> name -- disappeared from 'where they were', and where -- by full path name,
> again -- did you find them, and _with_what_names_?   If everything was
> moved from the same source point to the same destination, it's not necessary
> to itemize each one, but the details of _one_ 'typicaal' migration is needed.
> It is also significant if there was 'anything else' in the 'where they
> belonged' directory that was -not- moved.  *OR* if there was anything else
> (something other than the '/' of a jail) there, that was _also_ moved.

I took a screen shot because I somehow suspected no one would believe
me, I don't know if I can attach it here but I can send it to you
privately if not.

> "Narrative" descriptions, as previously provided, and while clear to someone
> familiar with the machcine in question, are not sufficiently precise to allow
> an 'outsider' to follow the events without 'logically' replicating the setup,
> and then guessing at the meaning of any shorthands employed.

OK. I can provide mostly any information required.

> One comment: for 'defensive' purposes it would be useful to break ad6 up
> into two slices, putting 'basejail' in it's own slice.  Then, for production
> use, that slice can be mounted RO, and with the 'system immutable' flag
> set on everything in that filesystem.

Yes. From one of your posts that became somewhat clear to me: Having
all the jails on a single 150GB slice seems like a bad idea.

Thanks! Let me know if I can provide anything else to help determine
the root cause.

Alejandro Imass

More information about the freebsd-questions mailing list