UFS Filesystem issues, and the loss of my hair...
John Baldwin
jhb at freebsd.org
Fri Aug 7 12:44:44 UTC 2009
On Thursday 06 August 2009 9:51:04 am Hearn, Trevor wrote:
> First off, let me state that I love FreeBSD. I've used it for years, and
have not had any major problems with it... Until now.
>
> As you can tell, I work for a major university. I setup a large storage
array to hold data for a project they have here. No great shakes, just some
standard files and such. The fun started when I started loading users onto
the system, and they started using it... Isn't that always the case? Now, I
get ufs_dirbad errors, and the system hard locks. This isn't the worst thing
that could happen, but when you're talking about file partitions the size
that I am using, the fsck takes FOREVER. Somewhere on the order of 1.5 hours.
During that time, I am bringing the individual shares/partitions online, but
the users suffer. I've asked about this before, in a different forum, but got
no usable information that I could see. So, here goes...
>
> The system is as such. A dell 2950 1U server, with a Qlogic Fibre Channel
card. It is connected to two Promise Array chassis, 610 series, each with 16
drives. Each chassis is running RAID 6, which gives me about 12.73tb of
storage per chassis. From there, the logical drives are sliced up into
smaller partitions. At most, I have a 3.6tb partition. The smallest is a
100gig partition.
>
> Filesystem Size Used Avail Capacity Mounted on
> /dev/mfid0s1a 197G 10G 170G 6% /
> devfs 1.0K 1.0K 0B 100% /dev
> /dev/da0p1 1.8T 1.5T 130G 92% /slice1
> /dev/da0p5 2.7T 1.8T 661G 74% /slice2
> /dev/da0p9 250G 21G 209G 9% /slice3
> /dev/da1p3 103G 12G 83G 12% /slice4
> /dev/da1p4 205G 54G 135G 29% /slice5
> /dev/da1p5 103G 7.3G 87G 8% /slice6
> /dev/da1p6 103G 22G 72G 23% /slice7
> etc...
>
> I had to use GPT to setup the partitions, and they are using UFS2 for the
filesystem. Now... If that's not fun enough... I have TWO of these creatures,
which RSYNC every 4 hours. The secondary system is across campus, and sits
idle 99% of the time. Every 4 hours, in a stepped schedule, the primary array
syncs to the secondary array. If the primary goes down, I FSCK, and any files
that are fried, I bring back across from the secondary and replace them. This
has worked OK for a while, but now I am getting Kernel Panics on a regular
basis. I've been told to migrate to a different filesystem, but my options
are ZFS and using GJOURNAL with UFS, from what I can tell. I need something
repeatable, simple, and I need something robust. I have NO idea why I keep
getting errors like this, but I imagine it's a cascading effect of other
hangs that have caused more corruption.
>
> I'd buy a fella, or gal, a cup of coffee and a pop-tart if they could help a
brother out. I have checked out this link:
>
http://phaq.phunsites.net/2007/07/01/ufs_dirbad-panic-with-mangled-entries-in-ufs/
> and decided that I need to give this a shot after hours, but being the kinda
guy I am, I need to make sure I am covering all of my bases.
Are you seeing ufs_dirbad panics? Specifically, can you capture the messages
on the console when the machine panics?
--
John Baldwin
More information about the freebsd-fs
mailing list