[OT] gcc: maximum length of an array?

Giorgos Keramidas keramida at ceid.upatras.gr
Tue Jul 25 19:13:46 UTC 2006


On 2006-07-27 17:02, "P.U.Kruppa" <root at pukruppa.de> wrote:
>On Sun, 23 Jul 2006, Giorgos Keramidas wrote:
>>On 2006-07-24 20:49, "P.U.Kruppa" <root at pukruppa.de> wrote:
>>> Hi,
>>> sorry for posting an [OT], but usually people on this list know
>>> everything :-)
>>>
>>> Since I don't know too much about programming I am frequently
>>> fascinated by simple things like Eratosthenes' sieve.  As you might
>>> remember, one has to create a boolean array for that. The longer the
>>> array the more primes can be found.
>>>
>>> With malloc() I can create an array of length 100000000 (10^8) and
>>> the first 5761455 primes are calculated in a few seconds.  So of
>>> course I would like to test length 10^9 but here my program crashes.
>>
>> If this is about integer values, which are probably 32-bit, you are
>> hitting the kern.maxdsiz limit of 512 MB.  An array of 100,000,000
>> 32-bit values takes up 4 * 100,000,000 = 400,000,000 (close to 400 MiB
>> of memory to store).  Anything above 512 MB in size will make the data
>> size of your program so big that it will overflow the data seg size:
>>
>>   $ ulimit -a
>>   core file size          (blocks, -c) unlimited
>>   data seg size           (kbytes, -d) 524288
>>   ...
>>
>> You can either increase kern.maxdsiz in your `/boot/loader.conf' file,
>> or redesign the algorithm to work with larger datasets by splitting them
>> in chunks that you can still process with 512 MB of data :)
>
> *How* can I effectively split my array up?

Not by using the original Sieve of Eratosthenes, that's for sure.

By sacrifising some of the speed, you can probably use secondary storage
though, to make sure that you keep at most 512 MB of data in physical
memory.

> How can I access an element arr[n] if n is bigger than INT_MAX ?
> I have tried some kind of linear/linked list, but that becomes
> disgustingly slow.

Actually, the limit of data offsets you can meaningfully access with a C
program is not INT_MAX, which may be as low as +32767 (see page 22 of
the ISO/IEC 9899:TC2 public draft of the C programming language[1]).

[1] Draft n1124 from http://www.open-std.org/JTC1/SC22/WG14/www/docs/

The largest size of object you can access with a conforming C program is
SIZE_MAX (see page 259 of the same PDF document).  The standard doesn't
require `size_t' to be much larger than `int' though, so this may still
be inadequate for processing huge datasets.

You have multiple options, the way I see it:

    * Bump kern.maxdsiz to something higher (this can work for much
      larger datasets than 512 MB, but a little after 2 GB things start
      getting ugly again).

    * Work on an amd64 system with LOTS of physical memory and a high
      kern.maxdsiz value.

    * Try to find a variation of the Sieve of Eratosthenes that can work
      with smaller memory load (possibly sacrifising, as you guessed,
      some of the speed for space).

One possible variation would be to keep copies of the data you have
processed in secondary storage and load only the parts needed in
physical memory.  A simplistic implementation of the Sieve of
Eratosthenes may result in heavy thrashing if you just swap in and out
regions of the numeric range as they are being accessed though :(




More information about the freebsd-questions mailing list