High disk load +mount/atacontrol/NFS/SMBFS crashes the system
Roger Olofsson
raggen at passagen.se
Mon Apr 23 16:48:56 UTC 2007
Alejandro Pulver skrev:
> On Sun, 15 Apr 2007 23:33:47 -0700
> Garrett Cooper <youshi10 at u.washington.edu> wrote:
>
>> Ale,
>> I'm not sure what's going on exactly based on the information you
>> provided, but I would try the following steps to isolate the issue:
>>
>> 1) See if you can upgrade the first machine to a later version of
>> FreeBSD, say 6.2. I believe that there were related issues resolved in
>> 6.2, but my memory could be incorrect. See if your problems occur after
>> that.
>
> I did that.
>
>> 2) Try grabbing a different machine if possible and see if the same
>> issue occurs when you put the new machine as server and client with one
>> of the other machines.
>
> I used a Win XP machine as client / server.
>
>> 3) Try switching roles with the 2 machines. If machine 1 is usually
>> server, let it play client and vice versa with machine 2.
>
> Also did this.
>
>> 4) Remove the new drive if possible, see if issue goes away. If it does,
>> try acquiring a cheap(er) drive and put it
>>
>
> It's the only drive it has, I meant the second machine is all new, not
> just the disk.
>
>> Also, it appears that another FreeBSD team member had a similar issue
>> (see: http://people.freebsd.org/~pho/stress/log/cons205.html and
>> http://people.freebsd.org/~pho/stress/log/cons225.html). I dunno how but
>> it showed up as one of the leading searches on Google.
>>
>> It looks like a (localized) filesystem issue, but I'm not sure what it
>> is exactly.
>>
>
> The fsync() problem seems to be related to that, but the rest could be
> be a different thing. Also I only got it twice. Maybe the filesystem
> issues were only derived from the crashes.
>
> I was unable to reproduce the problem in the first machine, maybe it
> was fixed on FreeBSD 6.2 as you said. The only things I also did when
> testing was unloading fuse.ko (unused) and linprocfs.ko (after
> umounting it). However I will test it a few times more, and let you
> know the results.
>
> The strange crash in the new 6.2 machine when using atacontrol is still
> unexplained and I couldn't make it happen again (it now refuses to
> switch to UDMA100 mode when it is SATA300, maybe they aren't supported
> in SATA drives, but the other time it just crashed without advise).
>
> Thank you for your help with this.
>
> Best Regards,
> Ale
Dear Ale,
I have experienced something similar as you described when this thread
started. The solution for me was to exchange the NIC I had for one that
worked better. I learned that using cheap nics with realtek chips causes
crashes even on the most stable operating system in the world.
When I browsed the source code for the driver of the realtek-based nic I
regretted I hadn't done so earlier. The comments were _crystal_ clear
about the design and performance of it. See /usr/src/sys/pci/if_rl.c. I
particularly liked the following bit:
/*
* Here's a totally undocumented fact for you. When the
* RealTek chip is in the process of copying a packet into
* RAM for you, the length will be 0xfff0. If you spot a
* packet header with this value, you need to stop. The
* datasheet makes absolutely no mention of this and
* RealTek should be shot for this.
*/
Hope you will solve the issue!
Greetings
/Roger
More information about the freebsd-hackers
mailing list