5.2-RELEASE - Show stopper problem

Ted Wisniewski ted at ness.plymouth.edu
Thu Jan 15 13:29:55 PST 2004


I have some followup details that I have discovered that make
make a difference in nailing this down.


	I tried to duplicate the hung process on disk I/O problem on
an older (hence slower machine) and was unable to duplicate it. This got
me to thinking about the problems I have experienced;   It appears that
the faster the machine the more likely the problem is to occur.  So,
with my new dual-3.06 Ghz server I could reproduce the Disk (getblk) state
at will.   ON the slower 2.0 Ghz I had to work at it a bit but I could 
reproduce it.  ON the 400Mhz (Dell PE 4300)  unable to re-create (well at least
I have not been able to yet).   If someone has things to try,  I will give it a
whirl.

Ted


(* (* In the last episode (Jan 11), Ted Wisniewski said:
(* (* > Thanks for your response...  As you can see in this output from the
(* (* > ps command you suggested, the processes are dfinitely waiting on the
(* (* > disk. BTW..  The syste in question was a fresh install from yesterday
(* (* > with no users other than myself (I did the cvsup to get it to
(* (* > 5.2-RELEASE).  It did hang when I did that with a similar result. 
(* (* > One of the "install -s etc.." processes went into the same state.
(* (* 
(* (* Are you seeing any errors in dmesg or /var/log/messages?  I haven't
(* (* seen any other reports of I/O hanging, so it might still be something
(* (* to do with your hardware or kernel config.
(* 
(* 	No messages at all in /var/log/messages.  I am using the generic 
(* kernel in one instance and a custom one in another.   For the machine I sent
(* the "ps" info it is a Dell power edge 2650 running a generic kernel.  The
(* disk is configuration is a big raid 5 memory is 2G.  Since I can duplicate
(* (seemingly at will) on a number of different systems, I doubt it is specific
(* to one machines hardware (3 dell servers of differeing models, 1 dell PC,
(* and 3 noname brand PC's).
(* 
(* (* > On my test system the machine will run for days with this happening,
(* (* > however, I have another system that is actually doing a lot of
(* (* > I/O....  eventually it crashes (well locks up completely)...  If
(* (* > there is any particular info you might need, I am willing to do what
(* (* > I can.
(* (* 
(* (* If you can drop into ddb when it's locked up, I think there are some
(* (* commands you can run to print the kernel locks held by all the
(* (* processes, but I'm not sure what they are or how to interpret the
(* (* results.
(* 
(* 	When it locks up...   It is literally frozen...  Only a power off
(* will cure.   I have occasionally seen a "page not present" panic..  Most
(* of the time the processes just start to pile up accessing the same place(s)
(* on disk.  None being able to be killed, and always when I reboot the system
(* after this there is a message about not being able to write buffers...  giving up...
(* 
(* 
(* 
(* Ted
(* 
(* -- 
(* |   Ted Wisniewski    		     E-Mail:  ted at mail.plymouth.edu        |
(* |   Manager, Systems Group           WEB:     http://oz.plymouth.edu/~ted/ |
(* |   Information Technology Services                                        |
(* |   Plymouth State University        Phone:   (603) 535-2661               |
(* |   Plymouth NH, 03264               Fax:     (603) 535-2263               |
(* _______________________________________________
(* freebsd-questions at freebsd.org mailing list
(* http://lists.freebsd.org/mailman/listinfo/freebsd-questions
(* To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"
(* 

-- 
|   Ted Wisniewski    		     E-Mail:  ted at mail.plymouth.edu        |
|   Manager, Systems Group           WEB:     http://oz.plymouth.edu/~ted/ |
|   Information Technology Services                                        |
|   Plymouth State University        Phone:   (603) 535-2661               |
|   Plymouth NH, 03264               Fax:     (603) 535-2263               |


More information about the freebsd-questions mailing list