misc/89103: gcc segmentation fault errors
Walter Roberts
wroberts at securenym.net
Fri Nov 18 06:00:32 GMT 2005
The following reply was made to PR misc/89103; it has been noted by GNATS.
From: "Walter Roberts" <wroberts at securenym.net>
To: <bug-followup at FreeBSD.org>, <wroberts at securenym.net>
Cc:
Subject: Re: misc/89103: gcc segmentation fault errors
Date: Fri, 18 Nov 2005 00:55:21 -0500
This is a multi-part message in MIME format.
------=_NextPart_000_0007_01C5EBDA.C0C39C10
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Ruled out hardware issue:
1. Ran memtest 86 -- 7 full cycles (18 hours +/-).
2. Reduced memory from 512Mb to 256Mb, repeated with different memory =
chip.
3. Ran full burncpu, passed.
Power supplies operating at nominal voltages.
System is apparently not using swap space for this process.
Replaced AMD K6 200 with old K6 slow processor=20
Same failure. CPU temps are <33C in all cases. I don't know the exact =
numbers, but it's typically around 28C.
This simply does not smell like a hardware problem, and I've been around =
these beasts for a long time....the first machine I programmed used =
magnetic CORE memory and had a whopping 8K memory with 12 bit words in =
it. When I ran high energy physics codes on Intel processors quite a =
few years ago, I got inconsistant answers using the same code (all =
fortran) between the i386(Intel) /unix and other machines (DEC, Cray, =
Tandem and i386(AMD)), and finally said that was hardware but couldn't =
get INTEL to believe me until after several others of us discussed the =
issue, all running the same code, and INTEL finally admitted that their =
chips couldn't add (and quickly reported to the world that it only =
affected certain 'scientific' uses which most people don't use, so they =
were safe for balancing your checkbook). I'm willing to believe you, =
but I'd like to know why you're so convinced this is a hardware issue. =20
The factors pointing against a hardware issue are: 1. The machine runs =
everything else without a problem. 2. The machine ran non-stop =
(non-reboot) on a UPS for over a half a year without a glitch, (take =
that NT), and it seems to run f90 ok, and most cc's ok. 3. The system =
runs very compute/memory intenstive monte carlo high energy physics code =
that stores lots and lots of numbers to be written to files at the end =
of the day and works consistantly. I would expect that if it weren't =
working properly, something would be amiss elsewhere and would expect a =
panic at some point, or the system to just plain stop working. 4. From =
the archives it appears that more than one of us is havng a similar =
problem. 5. This exact system ran for years without a glitch running =
FreeBSD 2.2 and FreeBSD 3.2. =20
Is it safe to upgrade to GCC 4? Would that solve the problem? I'd be =
happy to get it from gnu and try it, if it won't break anything. I =
don't have the time I used to have to go messing in operating system =
innards, much as I'd like to.
It is certainly possible that a pointer is misprogrammed (or perhaps the =
fixed point register in the AMD chip doesn't work right??) and picks up =
something funny that causes the compiler to have the "segementation =
fault 11" That fault is consistent!
Thanks
------=_NextPart_000_0007_01C5EBDA.C0C39C10
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2900.2769" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Ruled out hardware issue:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>1. Ran memtest 86 -- 7 full =
cycles (18 hours=20
+/-).</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>2. Reduced memory from 512Mb to =
256Mb,=20
repeated with different memory chip.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>3. Ran full burncpu, =
passed.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Power supplies operating at nominal=20
voltages.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>System is apparently not using swap =
space for this=20
process.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Replaced AMD K6 200 with old K6 =
slow=20
processor </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Same failure. CPU temps are =
<33C in all=20
cases. I don't know the exact numbers, but it's typically around=20
28C.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>This simply does not smell like a =
hardware problem,=20
and I've been around these beasts for a long time....the first machine I =
programmed used magnetic CORE memory and had a whopping 8K memory with =
12 bit=20
words in it. When I ran high energy physics codes =
on=20
Intel processors quite a few years ago, I got inconsistant=20
answers using the same code (all fortran) between =
the i386(Intel)=20
/unix and other machines (DEC, Cray, Tandem and i386(AMD)), and=20
finally said that was hardware but couldn't get INTEL to believe me =
until=20
after several others of us discussed the issue, all running the =
same code,=20
and INTEL finally admitted that their chips couldn't add (and =
quickly=20
reported to the world that it only affected certain 'scientific' uses =
which most=20
people don't use, so they were safe for balancing your checkbook). =
I'm willing to believe you, but I'd like to know why you're =
so=20
convinced this is a hardware issue. </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>The factors pointing against a hardware =
issue=20
are: 1. The machine runs everything else without a =
problem. =20
2. The machine ran non-stop (non-reboot) on a UPS for over a half =
a year=20
without a glitch, (take that NT), and it seems to run f90 ok, and most =
cc's=20
ok. 3. The system runs very compute/memory intenstive monte =
carlo=20
high energy physics code that stores lots and lots of numbers to be =
written to=20
files at the end of the day and works consistantly. I would expect =
that if=20
it weren't working properly, something would be amiss elsewhere and =
would expect=20
a panic at some point, or the system to just plain stop working. =
4. =20
From the archives it appears that more than one of us is havng a similar =
problem. </FONT><FONT face=3DArial size=3D2>5. This exact =
system ran for=20
years without a glitch running FreeBSD 2.2 and FreeBSD 3.2. =
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Is it safe to upgrade to GCC 4? =
Would that=20
solve the problem? I'd be happy to get it from gnu and try it, if =
it won't=20
break anything. I don't have the time I used to have to go messing =
in=20
operating system innards, much as I'd like to.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>It is certainly possible that a pointer =
is=20
misprogrammed (or perhaps the fixed point register in the AMD chip =
doesn't=20
work right??) and picks up something funny that causes the compiler to =
have the=20
"segementation fault 11" That fault is =
consistent!</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Thanks</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV></BODY></HTML>
------=_NextPart_000_0007_01C5EBDA.C0C39C10--
More information about the freebsd-bugs
mailing list