UFS Crash and directories now missing

Alejandro Imass ait at p2ee.org
Thu May 3 11:56:13 UTC 2012


On Mon, Apr 30, 2012 at 6:42 PM, Jerome Herman <jherman at dichotomia.fr> wrote:
[...]

> I must admit that Robert Bonomi tone was highly insulting for this list, and
> though I completely condemn the form of his post, I cannot say I disagree
> with the content.
>

I disagree with both the form and the content and I will tell you why
later... I do appreciate however the time you and everyone else
(including Robert Bonomi's), have taken to answer and post such
lengthy insights. I believe everyone's opinion is important and should
be respected.

> There are quite a lot of things that are wrong with Alejandro Imass' post
> and analysis.
> The fist thing is that he did not give is setup in one go. It took quite a
> while to figure what happened, what system he was using and how he was using
> it.
> At first he had to hard reboot an unresponsive system, then at reboot he
> would have lost all of his jail.
> Then it appeared that all the jails where inside another jail and that the
> unresponsiveness came from MySQL.
> Then we learn that all his daemons are inside jails.
> Then we learn that ftp-proxy is not.
> Then we learned that jail are not handled manually but through EZJail.
> Then we are told that the problem with MySQL is known and comes from a
> client using TigerCRM with a too much data.
> There are litterally dozens of little pieces of important knowledge all over
> the thread. And you have to read it all to make sure you have the global
> view. Not really a good start.
> It is OK to forget to mention a thing or two, discarding what you think is
> irrelevant to the problem at hand, but it is not OK to force people who are
> trying to help you to read 50+ posts to learn about the basics of your
> installation.
>

Granted. Nevertheless, the EzJail part (which I admit was a very
important piece of information) was left out my first and second post
was in fact established in the third post, so it was quite early in
the thread.

I think that it is not hard to put yourself in my shoes, and
understand that in a moment of crisis, your first priority is NOT
articulating the most complete and technical bug report you can. On
the contrary, it's a cry for help from your peer users to see if you
can gain some insight on solving the problem as quickly as possible.

> What is even more irritating is the fact that Alejandro Imass ignores pretty
> much anything that would leads toward a human mistake. Most posts implying a
> possible bad use of jails/nullfs/ezjail are ignored or answered by a simple
> "I have done everything by the book".  Now from my experience someone with 6
> servers, each containing multiple jails will not do everything by the book
> every time. It might be that Alejandro is exceptional, but it is more likely

Well, we do run everything by the book, precisely to avoid problems.
We find one recipe that works and stick to it like religion. I have
only used EzJail commands and **normal** use of EzJail. I am not
expected to know _extactly_ how it works, I trust that to the experts
in each field. As a user I am only expected to RTFM, and use it
accordingly.

Again let me remind everyone here, this list is precisely for that:
FreeBSD ***GENERAL QUESTIONS***. It is NOT a technical list. When you
and Robert Bonobi and everyone elese here subscribed to this
particular list, it should have been pretty clear:

- General lists: The following are general lists which anyone is free
(and encouraged) to join:
- freebsd-questions: User questions and technical support
- About freebsd-questions English (USA) :This is the mailing list for
questions about FreeBSD. You should not send "how to" questions to the
technical lists unless you consider the question to be pretty
technical.

So I am entitled to post general questions and provide information as
I see it fit, or if an expert on the list may ask for more. When I
posted the first few posts, that's all the information I had, if you
thought you needed more information, then you should have said so, but
instead your personal guess is a priori judgment call, which I found
almost as insulting as all of Bonobi's posts and I simply ignored you.

In retrospective, and after re-reading you first post and this one, I
can understand that having left EzJail out in the first post was a key
piece of information that would have probably caused you to answer
very differently, so I can somewhat justify your initial post, but to
me at that moment, you should have already known I was using EzJail.

> that at least one if not more of these jails were not made "by the book".
> Nothing to blame anyone in here, we all get tired/bored/overconfident
> sometime - but refusing to admit the very possibility of a human mistake
> won't help at all in finding a solution. Reading the thread I realized that
> my suggestion that he might have over-used "ln" had been discarded as
> "stupid", but the information came a lot later in answer to another post. Of

Yes, I must apologize for having ignored your post, but I found your a
priori *assumption* of human error almost as insulting as Robert
Bonomi's posts.  If I had done something that I think could have
contributed I would have said so. Do you think I would come here and
post something blaming the system, when in fact I would have thought
that human error was involved? I find that insulting. Do you think I
am afraid to say "I screwed up please help"? These a priori
assumptions about people is what pissed me off, especially because
it's not the first time I post on this list.

> course in the mean time I learned that he was using ezjail, which, if I had
> known earlier, would have made me wonder if he had not overused nullfs or
> ln. He furthermore discarded the possibility saying that he did not think
> that ezjail was using links, just nullfs. Well too bad ezjail is massively
> using links, at least for basejail, and sometime for port trees or perl
> setup depending which guide you are using as your reference.

The expectation that as a user I am required to know if EzJail uses
nullfs or links or the "source of fsck" is wrong. Some of this
information is the manual pages, some is scattered around, and yes,
there are several guides, but ultimately it all comes down to using
the EzJail commands, so when I say using it by the book, is because I
haven't done anything outside the EzJail commands, nor have I abused
nullfs, links or whatever.

I find it incredible that this is "the exception". I think the
majority of people that use FBSD *should* be using it by the book.

> During the thread he pretty much bashed anyone who tried to tell him that no
> amount of jail/ezjail/nullfs/journal screw up could have resulted in the
> entire content of the jails being moved into another completely unrelated
> directory node.  If one jail had moved it would already have been
> extraordinary, with a probability of it happening so cleanly that fsck would
> find nothing already magnitude of order above the chances of winning the
> national lottery. But all of them ? Not a chance. He finally admitted that
> he had very little knowledge about UFS and fsck, but still managed to do it
> in a quite offensive way.
>

This is false. I have only been offensive, actually defensive, against
Robert Bonobi.

> That was basically the point were I decided to stop to try to help him. I
> think others felt the same. This problem is quite interesting  in itself,
> and I think a lot of the most talented people on this list would have been
> on it but were repelled by the attitude.
>

Sorry, but this is false. At this point what I see is just you
justifying after the fact that a possible problem with nullfs has
resurfaced. Your prior assumptions like:
- "Nothing even remotely rings a bell.".
- "most of us will be inclined to think that you did something wrong."
-  "Extremely unlikely."

Were wrong to begin with and your attitude was wrong from the start,
so now you have to come here and turn this on me, when in fact it was
not only after several insulting threads that I ranted away, and not
even against you.

> On the other hand Alejandro Imass pretty much jumped on anything that would
> be a third party interaction. From someone hacking into his box to a
> potential nullfs bug that might result in a PR.
>

The problem with living in these little worlds is not being able to
picture oneself in the other person's shoes. If you have suddenly lost
more than a dozen jails without having done anything more than reboot
the system, I am pretty sure you would see this from another
perspective.

> Now the thing is that EZJail make use of the "system immutable flag" quite a
> lot for its config file, resulting in quite a lot of file being impossible
> to delete or move unless the box is running at kern_secure_level 0. This
> renders the whole "jails moved on their own" theory even more improbable.
>

Believe what you want and you are entitled to that; it's your right
and your opinion. But regardless of what you beleive, this is
something that actually happened. You don't hold the truth, only an
opinion.

> After so much ranting, I would feel bad not to try to help a little :
> Here are the facts :
> - In a jail, MySQL was grabbing all the CPU and making the box non
> responsive. This is due to TigerCRM making requests to a too huge database.
>        -> The jail was working
>        -> Unless all the data were in memory at this time (unprobable), it
> means that access path/nullfs/EZJail were OK at this time.
>
> - After a force reboot all the jails were gone, or more exactly moved inside
> another jail. fsck saw no error on the disk.
>        -> The disk was in a stable state at reboot, the directory and file
> structure was consistent.
>
> - Jails contained it the apache jail were in an OK state and could be
> archived and restored
>        -> The data structure of the hard drive was clean, and files contents
> were OK.
>
> From all this here is what we can safely assume :
> a) The box was not hacked, or at least the hacker did not move the jails
> around, this is confirmed by MySQL working and doing enough I/O to stale the
> box from inside a jail that was later seen has moved.
> b) The hard-reboot did not cause a problem, it revealed it. Since both fsck
> run fine and the data were preserved we can pretty safely assumed that there
> was no data or system corruption caused by the hard reboot.
>

Correct.

> Things to investigate :
> - When was the last time this box was rebooted normally ? Did it went fine ?

After I moved the jails to the right place I archived the jails with
ezjail-admin and rebooted the server several times, and everything
worked as expected.

> Were the jails created at this time ?

No. Most of these jails have been operational for over a year on this
server without any incidents.

> - What happens if you deactivate the jail that "survived" and reboot
> normally, would the other jail contained in it start ? If you deactivate the
> jail but leave the nullfs mapping on and try to restart EZJail ? Do the
> other jails start ?

After moving the jails to the correct directory, and rebooting several
times everything has worked as expected. There have been no incident
reports on any of the 14 jails and most contain relatively complex
systems, databases, etc.

> - What is the content of the different fstab.* and of the EZJail conf ? Does
> any of it points inside the jail that survived the reboot ?
>

Not that I know of but I can provide this information in a follow-up
post. All the jails were created with ezjail-admin. I do use
"flavours" quite extensively.

> Unfortunately since the server was "corrected" and we probably won't have a
> satisfying answer. But honestly the probability of a system bug is really
> low. Very likely the "moved" jails were inside the surviving jail from the
> beginning, and a mix of nullfs remap and lack of reboot masked this fact for
> a while.
>

As I told you earlier, this server has been running for over a year
and we have rebooted many times. If there are such problems they exist
by using the EzJail commands and I find this unlikely. We've been
using EzJail extensively and this is the first time we've had any
problems.

here is the fstab of the base system:

# Device		Mountpoint	FStype	Options		Dump	Pass#
/dev/ad4s1b		none		swap	sw		0	0
/dev/ad4s1a		/		ufs	rw		1	1
/dev/ad4s1d		/tmp		ufs	rw		2	2
/dev/ad4s1f		/usr		ufs	rw		2	2
/dev/ad4s1e		/var		ufs	rw		2	2
/dev/ad6s1.journal	/usr/jails	ufs	rw,async	2	2
/dev/cd0		/cdrom		cd9660	ro,noauto	0	0

here is the mount output is that's of any help:

/dev/ad4s1a on / (ufs, local)
devfs on /dev (devfs, local, multilabel)
/dev/ad4s1d on /tmp (ufs, local, soft-updates)
/dev/ad4s1f on /usr (ufs, local, soft-updates)
/dev/ad4s1e on /var (ufs, local, soft-updates)
/dev/ad6s1.journal on /usr/jails (ufs, asynchronous, local, gjournal)
/usr/jails/basejail on /usr/jails/yabarana-php53/basejail (nullfs,
local, read-only)
devfs on /usr/jails/yabarana-php53/dev (devfs, local, multilabel)
fdescfs on /usr/jails/yabarana-php53/dev/fd (fdescfs)
procfs on /usr/jails/yabarana-php53/proc (procfs, local)
/usr/jails/basejail on /usr/jails/yabarana-php52/basejail (nullfs,
local, read-only)
devfs on /usr/jails/yabarana-php52/dev (devfs, local, multilabel)
fdescfs on /usr/jails/yabarana-php52/dev/fd (fdescfs)
procfs on /usr/jails/yabarana-php52/proc (procfs, local)
/usr/jails/basejail on /usr/jails/yabarana-cat58/basejail (nullfs,
local, read-only)
devfs on /usr/jails/yabarana-cat58/dev (devfs, local, multilabel)
fdescfs on /usr/jails/yabarana-cat58/dev/fd (fdescfs)
procfs on /usr/jails/yabarana-cat58/proc (procfs, local)
/usr/jails/basejail on /usr/jails/testbed/basejail (nullfs, local, read-only)
devfs on /usr/jails/testbed/dev (devfs, local, multilabel)
fdescfs on /usr/jails/testbed/dev/fd (fdescfs)
procfs on /usr/jails/testbed/proc (procfs, local)
/usr/jails/basejail on /usr/jails/pyugmao/basejail (nullfs, local, read-only)
devfs on /usr/jails/pyugmao/dev (devfs, local, multilabel)
fdescfs on /usr/jails/pyugmao/dev/fd (fdescfs)
procfs on /usr/jails/pyugmao/proc (procfs, local)
/usr/jails/basejail on /usr/jails/php53base/basejail (nullfs, local, read-only)
devfs on /usr/jails/php53base/dev (devfs, local, multilabel)
fdescfs on /usr/jails/php53base/dev/fd (fdescfs)
procfs on /usr/jails/php53base/proc (procfs, local)
/usr/jails/basejail on /usr/jails/php52base/basejail (nullfs, local, read-only)
devfs on /usr/jails/php52base/dev (devfs, local, multilabel)
fdescfs on /usr/jails/php52base/dev/fd (fdescfs)
procfs on /usr/jails/php52base/proc (procfs, local)
/usr/jails/basejail on /usr/jails/mcs-cat58/basejail (nullfs, local, read-only)
devfs on /usr/jails/mcs-cat58/dev (devfs, local, multilabel)
fdescfs on /usr/jails/mcs-cat58/dev/fd (fdescfs)
procfs on /usr/jails/mcs-cat58/proc (procfs, local)
/usr/jails/basejail on /usr/jails/http-proxy/basejail (nullfs, local, read-only)
devfs on /usr/jails/http-proxy/dev (devfs, local, multilabel)
fdescfs on /usr/jails/http-proxy/dev/fd (fdescfs)
procfs on /usr/jails/http-proxy/proc (procfs, local)
/usr/jails/basejail on /usr/jails/corcaribe-php53/basejail (nullfs,
local, read-only)
devfs on /usr/jails/corcaribe-php53/dev (devfs, local, multilabel)
fdescfs on /usr/jails/corcaribe-php53/dev/fd (fdescfs)
procfs on /usr/jails/corcaribe-php53/proc (procfs, local)
devfs on /usr/jails/cmm-php52-1/dev (devfs, local, multilabel)
/usr/jails/basejail on /usr/jails/cm-website/basejail (nullfs, local, read-only)
devfs on /usr/jails/cm-website/dev (devfs, local, multilabel)
fdescfs on /usr/jails/cm-website/dev/fd (fdescfs)
procfs on /usr/jails/cm-website/proc (procfs, local)
/usr/jails/basejail on /usr/jails/cm-idvida/basejail (nullfs, local, read-only)
devfs on /usr/jails/cm-idvida/dev (devfs, local, multilabel)
fdescfs on /usr/jails/cm-idvida/dev/fd (fdescfs)
procfs on /usr/jails/cm-idvida/proc (procfs, local)
/usr/jails/basejail on /usr/jails/cat58base/basejail (nullfs, local, read-only)
devfs on /usr/jails/cat58base/dev (devfs, local, multilabel)
fdescfs on /usr/jails/cat58base/dev/fd (fdescfs)
procfs on /usr/jails/cat58base/proc (procfs, local)
/usr/jails/basejail on /usr/jails/cmm-php52-1/basejail (nullfs, local,
read-only)
fdescfs on /usr/jails/cmm-php52-1/dev/fd (fdescfs)
procfs on /usr/jails/cmm-php52-1/proc (procfs, local)


More information about the freebsd-questions mailing list