ELF .data section variables and RWX bits

Shill devnull at example.com
Sat May 24 10:53:31 PDT 2003

I wrote a small program stub to measure execution time on an Athlon 4:

  mov ebp, eax
  xor eax, eax

  xor eax, eax
  sub ebp, eax
  xor eax, eax
  neg ebp

Note: CPUID is used only as a serializing instruction.

Let n be the number of cycles required to execute the code
between the two RDTSC instructions. At the end of the stub,
ebp is equal to n modulo 2^32.

The stub alone (consistently) requires 104 cycles to execute.

So far, so good.

I wanted to time the latency of a store, so I declared a single
variable within the .data section:


X: dd 0x12345678

I timed three different programs:
P1) mov ebx, [X]		; load i.e. read only
P2) mov dword [X], 0xaabbccdd	; store i.e. write only
P3) add dword [X], byte 0x4C	; load/execute/store i.e. read+write

P1 requires 170 cycles.
P2 requires 12000 cycles on average (MIN=10000 and MAX=46000)
P3 requires 22500 cycles on average (MIN=14500 and MAX=72000)

NASM gives the ELF .data section the following properties:
  progbits (i.e. explicit contents, unlike the .bss section)
  alloc (i.e. load into memory when the program is run)
  noexec (i.e. clear the allow execute bit)
  write (i.e. set the allow write bit)
  align=4 (i.e. start the section at a multiple of 4)

A cache miss might explain why P1 requires 170 cycles but it does not
explain P2 or P3, as far as I know.

My guess is that the first time X is written to, an exception occurs
(perhaps a TLB miss) and the operating system (FreeBSD in my case) is
required to update something, somewhere.

Could it be that FreeBSD does not set the write bit for the page where X
is stored until X is *actually* written to? But then why would P3 take
much longer than P2?

As you can see, I am very confused. I would be very grateful if an ELF
and/or ld guru could step in and share his insight.


More information about the freebsd-questions mailing list