misc/89103: gcc segmentation fault errors

Fri Nov 18 06:00:32 GMT 2005

The following reply was made to PR misc/89103; it has been noted by GNATS.

From: "Walter Roberts" <wroberts at securenym.net>
To: <bug-followup at FreeBSD.org>, <wroberts at securenym.net>
Cc:  
Subject: Re: misc/89103: gcc segmentation fault errors 
Date: Fri, 18 Nov 2005 00:55:21 -0500

 This is a multi-part message in MIME format.

 ------=_NextPart_000_0007_01C5EBDA.C0C39C10
 Content-Type: text/plain;
 	charset="iso-8859-1"
 Content-Transfer-Encoding: quoted-printable

 Ruled out hardware issue:

 1.  Ran memtest 86 -- 7 full cycles (18 hours +/-).
 2.  Reduced memory from 512Mb to 256Mb, repeated with different memory =
 chip.
 3.  Ran full burncpu, passed.

 Power supplies operating at nominal voltages.

 System is apparently not using swap space for this process.

 Replaced AMD K6  200 with old K6 slow processor=20

 Same failure.  CPU temps are <33C in all cases.  I don't know the exact =
 numbers, but it's typically around 28C.

 This simply does not smell like a hardware problem, and I've been around =
 these beasts for a long time....the first machine I programmed used =
 magnetic CORE memory and had a whopping 8K memory with 12 bit words in =
 it.   When  I ran high energy physics codes on Intel processors quite a =
 few years ago, I got inconsistant answers using the same code (all =
 fortran) between the i386(Intel) /unix and other machines (DEC, Cray, =
 Tandem and i386(AMD)), and finally said that was hardware but couldn't =
 get INTEL to believe me until after several others of us discussed the =
 issue, all running the same code, and INTEL finally admitted that their =
 chips couldn't add (and quickly reported to the world that it only =
 affected certain 'scientific' uses which most people don't use, so they =
 were safe for balancing your checkbook).    I'm willing to believe you, =
 but I'd like to know why you're so convinced this is a hardware issue. =20

 The factors pointing against a hardware issue are:  1.  The machine runs =
 everything else without a problem.  2.  The machine ran non-stop =
 (non-reboot) on a UPS for over a half a year without a glitch, (take =
 that NT), and it seems to run f90 ok, and most cc's ok.  3.  The system =
 runs very compute/memory intenstive monte carlo high energy physics code =
 that stores lots and lots of numbers to be written to files at the end =
 of the day and works consistantly.  I would expect that if it weren't =
 working properly, something would be amiss elsewhere and would expect a =
 panic at some point, or the system to just plain stop working.  4.  From =
 the archives it appears that more than one of us is havng a similar =
 problem.  5.  This exact system ran for years without a glitch running =
 FreeBSD 2.2 and FreeBSD 3.2. =20

 Is it safe to upgrade to GCC 4?  Would that solve the problem?  I'd be =
 happy to get it from gnu and try it, if it won't break anything.  I =
 don't have the time I used to have to go messing in operating system =
 innards, much as I'd like to.

 It is certainly possible that a pointer is misprogrammed (or perhaps the =
 fixed point  register in the AMD chip doesn't work right??) and picks up =
 something funny that causes the compiler to have the "segementation =
 fault  11"  That fault is consistent!

 Thanks

 ------=_NextPart_000_0007_01C5EBDA.C0C39C10
 Content-Type: text/html;
 	charset="iso-8859-1"
 Content-Transfer-Encoding: quoted-printable

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
 <HTML><HEAD>
 <META http-equiv=3DContent-Type content=3D"text/html; =
 charset=3Diso-8859-1">
 <META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR>
 <STYLE></STYLE>
 </HEAD>
 <BODY bgColor=3D#ffffff>
 <DIV><FONT face=3DArial size=3D2>Ruled out hardware issue:</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>1.&nbsp; Ran memtest 86 -- 7 full =
 cycles (18 hours=20
 +/-).</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>2.&nbsp; Reduced memory from 512Mb to =
 256Mb,=20
 repeated with different memory chip.</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2>3.&nbsp; Ran full burncpu, =
 passed.</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>Power supplies operating at nominal=20
 voltages.</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>System is apparently not using swap =
 space for this=20
 process.</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>Replaced AMD K6&nbsp; 200 with old K6 =
 slow=20
 processor </FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>Same failure.&nbsp; CPU temps are =
 &lt;33C in all=20
 cases.&nbsp; I don't know the exact numbers, but it's typically around=20
 28C.</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>This simply does not smell like a =
 hardware problem,=20
 and I've been around these beasts for a long time....the first machine I =

 programmed used magnetic CORE memory and had a whopping 8K memory with =
 12 bit=20
 words in it.&nbsp;&nbsp; When&nbsp; I ran&nbsp;high energy physics codes =
 on=20
 Intel processors quite a few years ago, I got&nbsp;inconsistant=20
 answers&nbsp;using the same code (all fortran) between =
 the&nbsp;i386(Intel)=20
 /unix&nbsp;and other machines (DEC, Cray, Tandem and i386(AMD)), and=20
 finally&nbsp;said that was hardware but couldn't get INTEL to believe me =
 until=20
 after several&nbsp;others of us discussed the issue, all running the =
 same code,=20
 and INTEL finally admitted that their chips&nbsp;couldn't add (and =
 quickly=20
 reported to the world that it only affected certain 'scientific' uses =
 which most=20
 people don't use, so they were safe for balancing your checkbook).&nbsp; =

 &nbsp;&nbsp;I'm willing to believe you, but I'd like to know why you're =
 so=20
 convinced this is a hardware issue.&nbsp; </FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>The factors pointing against a hardware =
 issue=20
 are:&nbsp; 1.&nbsp; The machine runs everything else without a =
 problem.&nbsp;=20
 2.&nbsp; The machine ran non-stop (non-reboot) on a UPS for over a half =
 a year=20
 without a glitch, (take that NT), and it seems to run f90 ok, and most =
 cc's=20
 ok.&nbsp; 3.&nbsp; The system runs very compute/memory intenstive monte =
 carlo=20
 high energy physics code that stores lots and lots of numbers to be =
 written to=20
 files at the end of the day and works consistantly.&nbsp; I would expect =
 that if=20
 it weren't working properly, something would be amiss elsewhere and =
 would expect=20
 a panic at some point, or the system to just plain stop working.&nbsp; =
 4.&nbsp;=20
 From the archives it appears that more than one of us is havng a similar =

 problem.&nbsp; </FONT><FONT face=3DArial size=3D2>5.&nbsp; This exact =
 system ran for=20
 years without a glitch running FreeBSD 2.2 and FreeBSD 3.2.&nbsp; =
 </FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>Is it safe to upgrade to GCC 4?&nbsp; =
 Would that=20
 solve the problem?&nbsp; I'd be happy to get it from gnu and try it, if =
 it won't=20
 break anything.&nbsp; I don't have the time I used to have to go messing =
 in=20
 operating system innards, much as I'd like to.</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>It is certainly possible that a pointer =
 is=20
 misprogrammed (or perhaps the fixed point&nbsp; register in the AMD chip =
 doesn't=20
 work right??) and picks up something funny that causes the compiler to =
 have the=20
 "segementation fault&nbsp; 11"&nbsp; That fault is =
 consistent!</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2>Thanks</FONT></DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
 <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>

 ------=_NextPart_000_0007_01C5EBDA.C0C39C10--