7-STABLE broke drscheme in week between 4 and 11 Jan

Andrew Reilly andrew-freebsd at areilly.bpc-users.org
Wed Jan 23 12:32:27 PST 2008


Hi John,

On 24/01/2008, at 01:04, John Baldwin wrote:
>> Anyway, the last part of the ktrace of the broken version (the
>> earlier parts are just loading up shared libraries) looks like:
>> (I sedded the ^pid out, so that I could get a better look at it
>> with diff (meld, actually: it's nice)).
>
> There were changes to make binaries get SIGSEGV instead of SIGBUS  
> (or vice
> versa) for certain VM-related traps.  That might be related in  
> which case
> the source code for the app will need to catch a different signal.   
> You can
> test this by fiddling with the machdep.prot_fault_translation sysctl.

That wasn't the problem: version 372 already had a patch to use SIGSEGV.

>> the faulting instruction is:
>> 0x0000000800bdecc6 <GC_init_type_tags+598>:	mov %r13,(%rdx,%rax,8)
>>
>> r13 is 0x803900000
>> rdx is -1
>> rax is 0xe40
>>
>> Any thoughts on why that would be faulting?  (Looks like a write
>> to a very low address (code?) to me, but I'm not up on the VM
>> map yet.)
>
> rdx should be a pointer to an array or some such, but it is a bogus
> pointer and that is why you are faulting.

Yes, that was indeed the problem.  The garbage collector was  
expecting memory returned by malloc to be zero, which, of course, it  
wasn't.  It seems as though it was simply luck that the particular  
access was reliably zero before this particular change to FreeBSD.  I  
should get more used to using the debugging support in our malloc, to  
grunge up malloc'd memory.

>
>> The only thing that looks appropriate that changed in that week
>> was sys/vm/vm_map.c, which had some new code added to help with
>> shm mappings.  I looked at it, but it didn't look as though it
>> would affect this.
>
> Those changes were only in HEAD, so if you are seeing them in your  
> kernel
> you are running HEAD and not RELENG_7.

Yes.  I did the binary-search thing, and found that the actual change  
that broke things was an MFC of src/contrib/gcc/gthr-posix.h, on 5  
Jan.  There's no obvious (to me) reason why that change would have  
affected the malloc system, but there must have been some epsilon of  
memory alignment that pushed a non-zero value into the particular  
memory that was being returned and tested in that instance.  An  
entertaining debugging experience...

Patches to drscheme have been accepted up-stream, so all should be  
dandy very soon.

Cheers,

Andrew




More information about the freebsd-stable mailing list