High disk load +mount/atacontrol/NFS/SMBFS crashes the system

Roger Olofsson raggen at passagen.se
Mon Apr 23 16:48:56 UTC 2007


Alejandro Pulver skrev:
> On Sun, 15 Apr 2007 23:33:47 -0700
> Garrett Cooper <youshi10 at u.washington.edu> wrote:
> 
>> Ale,
>> 	I'm not sure what's going on exactly based on the information you 
>> provided, but I would try the following steps to isolate the issue:
>>
>> 1) See if you can upgrade the first machine to a later version of 
>> FreeBSD, say 6.2. I believe that there were related issues resolved in 
>> 6.2, but my memory could be incorrect. See if your problems occur after 
>> that.
> 
> I did that.
> 
>> 2) Try grabbing a different machine if possible and see if the same 
>> issue occurs when you put the new machine as server and client with one 
>> of the other machines.
> 
> I used a Win XP machine as client / server.
> 
>> 3) Try switching roles with the 2 machines. If machine 1 is usually 
>> server, let it play client and vice versa with machine 2.
> 
> Also did this.
> 
>> 4) Remove the new drive if possible, see if issue goes away. If it does, 
>> try acquiring a cheap(er) drive and put it
>>
> 
> It's the only drive it has, I meant the second machine is all new, not
> just the disk.
> 
>> 	Also, it appears that another FreeBSD team member had a similar issue 
>> (see: http://people.freebsd.org/~pho/stress/log/cons205.html and 
>> http://people.freebsd.org/~pho/stress/log/cons225.html). I dunno how but 
>> it showed up as one of the leading searches on Google.
>>
>> 	It looks like a (localized) filesystem issue, but I'm not sure what it 
>> is exactly.
>>
> 
> The fsync() problem seems to be related to that, but the rest could be
> be a different thing. Also I only got it twice. Maybe the filesystem
> issues were only derived from the crashes.
> 
> I was unable to reproduce the problem in the first machine, maybe it
> was fixed on FreeBSD 6.2 as you said. The only things I also did when
> testing was unloading fuse.ko (unused) and linprocfs.ko (after
> umounting it). However I will test it a few times more, and let you
> know the results.
> 
> The strange crash in the new 6.2 machine when using atacontrol is still
> unexplained and I couldn't make it happen again (it now refuses to
> switch to UDMA100 mode when it is SATA300, maybe they aren't supported
> in SATA drives, but the other time it just crashed without advise).
> 
> Thank you for your help with this.
> 
> Best Regards,
> Ale

Dear Ale,

I have experienced something similar as you described when this thread 
started. The solution for me was to exchange the NIC I had for one that 
worked better. I learned that using cheap nics with realtek chips causes 
crashes even on the most stable operating system in the world.

When I browsed the source code for the driver of the realtek-based nic I 
regretted I hadn't done so earlier. The comments were _crystal_ clear 
about the design and performance of it. See /usr/src/sys/pci/if_rl.c. I 
particularly liked the following bit:

                 /*
                  * Here's a totally undocumented fact for you. When the
                  * RealTek chip is in the process of copying a packet into
                  * RAM for you, the length will be 0xfff0. If you spot a
                  * packet header with this value, you need to stop. The
                  * datasheet makes absolutely no mention of this and
                  * RealTek should be shot for this.
                  */

Hope you will solve the issue!

Greetings
/Roger


More information about the freebsd-hackers mailing list