panic again

Tue Oct 26 10:34:15 PDT 2004

Hello ,

Tuesday, October 26, 2004, 8:36:01 PM, you wrote:

> Hello, Pavel,

> First off, I'm sorry to hear your terrible story.

Really? I thought I'm not the only one.

...

>> Oct 26 05:11:54 images8 /kernel: panic: ffs_valloc: dup alloc
>> Oct 26 05:11:54 images8 /kernel:
>> Oct 26 05:11:54 images8 /kernel: syncing disks... 49 12
>> Oct 26 05:11:54 images8 /kernel: done

> It's wrong to use dd'ed disk IMHO.  A better solution for your situation
> might be:

I think it's ok in my case, because the drive has no read errors. I
dont know why, but it works slowly and causes NFS slowness in it's
turn. Maybe it has not a full failure, just a cable, we didn't
investigate the problem yet. I now realised that I should have checked
the copy first. It looks like working now.

...

>>   My  question  is:  Is  there  any  future  in  FFS,  it's panics and
>>   non-working softupdates?

> Frankly, the ONLY way to guarantee data integrity is more backups.
> How can you expect your operating system to run correctly when it is
> running on defective hardware?

Sorry, I didn't explain my point thoroughly. I meant non-working
softupdates on non-faulty hardware. Press "Reset" on busy server with
many drives, mount -f (softupdates mount?) and you will surely get a
panic in an hour.

>>   I dont see any reliability in such system.
>>   But what I remember is high reliability of MS NTFS. I didn't see any
>>   disk  checks  after any failure and I didn't experience a file loss.
>>   And  I didn't see it's popular "blue screen" with an error caused by
>>   filesystem code.

> Accusing other systems is not what we feel beneficial because we don't
> sell FreeBSD, and our interest is to improve *our* system and make it
> even better. Nothing but backup can guarantee that you survive from a
> hardware failure.

I wrote that to prove the fact that it's possible to have a reliable
filesystem.

> The intention of panic() in our code is to stop the operating system
> before it can make more damage.  We feel that your data is more important
> than "pretending to work" but silently damage your data.

I know that. But my opinion after looking at the code is that
panic()'s are used even in cases when error can be fixed or reported
to higher level. Maybe I'm wrong, but NTFS case confirms that it's
possible to have no panics in reality.
Again, somehow after a panic on ONE file system, other filesystems are
not fully synced. The system conplaints that they are dirty after
restart. So it seems like one panic lead to corruption of another
systems. Maybe I'm wrong here too. But I dont see any good in fsck-ing
each time.
Background fsck does not work in reality as well, because the system
can panic thousand times before errors are fixed.

>>   I  think  that FreeBSD has no future without a reliable FS and clean
>>   code for it.
> There are many efforts focusing the storage system and file system, while
> we still need more manpower to work on it.
What kind of manpower?

...

>>   Sorry  if  I wrote too long letter. Our company is just tired of the
>>   problems related to all of this.
> User experience is important, but again, we need more details, manpower.
I'm not sure if it is possible to do anything with FFS at all.

>>   And,  by  the  way, FFS code still have a divide by integer error in
>>   dirpref().  I  tried  to  report  it two times, I saw it reported in
>>   lists, but nobody cares :( . No future.

> In order to get your problem tracked down, you need to provide more
> information.  A good start is to set a dump area (e.g. if your swap

There is no need to track it down. I already described the problem in
freebsd-stable. If you look at dirpref(), it becomes clear that 32 bit
numbers cannot be used for calculation. I saw a discussion about that
in some list on google.

...

> Cheers,

Thanks.

-- 
/ Pavel Merdine