The optimization of malloc(3): FreeBSD vs GNU libc

Intron is my Internet alias mag at intron.ac
Wed Aug 16 10:05:41 UTC 2006


Jason Evans wrote:

> (LI Xin) wrote:
>> 2006-08-15 02:38 +0300ladimir Kushnir>>> On -CURENT amd64 (Athlon64 3000+, 512k L2 cache):
>>> 
>>> With jemalloc (without MY_MALLOS):
>>>   ~/fdtd> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
>>> ...
>>> 116.34 real       113.69 user         0.00 sys
>>> 
>>> With MY_MALLOC:
>>>   ~/fdtd> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
>>> ...
>>> 45.30 real        44.29 user         0.00 sys
>> 
>> Have you turned off the debugging options, i.e. ln -sf
>> 'aj' /etc/malloc.conf?
> 
> If you want to do a fair comparison, you will also define NO_MALLOC_EXTRAS 
> when compiling malloc.c, in order to turn off the copious assertions, not 
> to mention the statistics gathering code.
> 
> Before you do that though, it would be useful to turn statistics reporting 
> on (add MALLOC_OPTIONS=P to your environment when running the test 
> program) and see what malloc says is going on.
> 
> [I am away from my -current system at the moment, so can't benchmark the 
> program.]  If I understand the code correctly (and assuming the command 
> line parameters specified), it starts out by requesting 3517 2000-byte 
> allocations from malloc, and never frees any of those allocations.

You're right. My friend's program evaluates an electromagnetic field
by Finite-Difference Time-Domain method.

> 
> Both phkmalloc and jemalloc will fit two allocations in each page of 
> memory.  phkmalloc will call sbrk() at least 1759 times.  jemalloc will 
> call sbrk() approximately 6 times.  2kB allocations are a worst case for 
> some of jemalloc's internal bookkeeping, but this shouldn't be a serious 
> problem.  Fragmentation for these 2000-byte allocations should total 
> approximately 6%.

The same bad case with mmap(2) if mmap(2) is used to obtain small memory
block each time. A hierarchical memory management mechanism is required,
just like those of GNU libc and your new code.

The essence of this problem is that memory management of operating system
can affect working efficiency of CPU hardware greatly.

Actually, not only my friend's program can cause the problem, but also
many applications using strdup(3) frequently.

/usr/src/lib/libc/string/strdup.c:

char *
strdup(str)
	const char *str;
{
	size_t len;
	char *copy;

	len = strlen(str) + 1;
	if ((copy = malloc(len)) == NULL)
		return (NULL);
	memcpy(copy, str, len);
	return (copy);
}

> 
> malloc certainly incurs more overhead than a specialized sbrk()-based 
> allocator, but I don't see any particular reason that jemalloc should 
> perform suboptimally, as compared to any complete malloc implementation, 
> for fdtd.  If you find otherwise, please tell me so that I can look into 
> the issue.
> 
> Thanks,
> Jason
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"


------------------------------------------------------------------------
                                                From Beijing, China



More information about the freebsd-hackers mailing list