misc/89103: gcc segmentation fault errors

Mon Nov 21 21:47:02 GMT 2005

On Fri, Nov 18, 2005 at 06:00:32AM +0000, Walter Roberts wrote:
> The following reply was made to PR misc/89103; it has been noted by GNATS.
> 
> From: "Walter Roberts" <wroberts at securenym.net>
> To: <bug-followup at FreeBSD.org>, <wroberts at securenym.net>
> Cc:  
> Subject: Re: misc/89103: gcc segmentation fault errors 
> Date: Fri, 18 Nov 2005 00:55:21 -0500
> 
>  This is a multi-part message in MIME format.
>  
>  ------=_NextPart_000_0007_01C5EBDA.C0C39C10
>  Content-Type: text/plain;
>  	charset="iso-8859-1"
>  Content-Transfer-Encoding: quoted-printable
>  
>  Ruled out hardware issue:
>  
>  1.  Ran memtest 86 -- 7 full cycles (18 hours +/-).
>  2.  Reduced memory from 512Mb to 256Mb, repeated with different memory =
>  chip.
>  3.  Ran full burncpu, passed.
>  
>  Power supplies operating at nominal voltages.
>  
>  System is apparently not using swap space for this process.
>  
>  Replaced AMD K6  200 with old K6 slow processor=20
>  
>  Same failure.  CPU temps are <33C in all cases.  I don't know the exact =
>  numbers, but it's typically around 28C.
>  
>  This simply does not smell like a hardware problem

[Snip historical anecdotes]

>   I'm willing to believe you, =
>  but I'd like to know why you're so convinced this is a hardware issue. =20

Because I've been answering these questions for years, and I've seen
dozens of people start out saying "I'm convinced it's not a hardware
problem" and then working their way around to "it was a hardware
problem, sorry for wasting your time".

>  The factors pointing against a hardware issue are:  1.  The machine runs =
>  everything else without a problem.  2.  The machine ran non-stop =
>  (non-reboot) on a UPS for over a half a year without a glitch, (take =
>  that NT), and it seems to run f90 ok, and most cc's ok.  3.  The system =
>  runs very compute/memory intenstive monte carlo high energy physics code =
>  that stores lots and lots of numbers to be written to files at the end =
>  of the day and works consistantly.  I would expect that if it weren't =
>  working properly, something would be amiss elsewhere and would expect a =
>  panic at some point, or the system to just plain stop working.  4.  From =
>  the archives it appears that more than one of us is havng a similar =
>  problem.

Not that I've seen.  Where are these other reports?

>  5.  This exact system ran for years without a glitch running =
>  FreeBSD 2.2 and FreeBSD 3.2. =20

This kind of problem can be *very* workload-specific.  i.e. everything
will work fine except one task that tickles the machine in exactly the
right way to trigger the hardware failure.

Yes, I've seen exactly this scenario happen many times.

>  Is it safe to upgrade to GCC 4?  Would that solve the problem?  I'd be =
>  happy to get it from gnu and try it, if it won't break anything.  I =
>  don't have the time I used to have to go messing in operating system =
>  innards, much as I'd like to.

It won't fix a hardware problem, naturally.  You can't use a
non-system compiler to compile FreeBSD, although you could compile
your own code with it.

>  It is certainly possible that a pointer is misprogrammed (or perhaps the =
>  fixed point  register in the AMD chip doesn't work right??) and picks up =
>  something funny that causes the compiler to have the "segementation =
>  fault  11"  That fault is consistent!

I'm sure it's consistent on this machine, but you're really reaching
by suggesting that it's a CPU bug affecting thousands of users :-)

Kris

P.S. Did you say in a previous email that the machine worked fine when
it was running at a site at high altitude, but stopped working when
you moved it and then upgraded it?  That's a big clue that says
something broke at that point (or before, but was masked by lower
ambient temperatures, or something).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-bugs/attachments/20051121/9370e0d2/attachment.bin