mlockall() failure and direction for possible solution
Kostik Belousov
kostikbel at gmail.com
Sun Apr 5 08:59:24 PDT 2009
On Sun, Apr 05, 2009 at 01:51:44PM +0200, Hans Ottevanger wrote:
> Hi folks,
>
> As has been noted before, there is an issue with the mlockall() system
> call always failing on (at least) the amd64 architecture. This is quite
> evident by the automounter (as configured out-of-the-box) printing error
> messages on startup like:
>
> Couldn't lock process pages in memory using mlockall()
>
> I have verified the occurrence of this issue on the amd64 platform on
> 7.1-STABLE and 8.0-CURRENT. On the i386 platform this problem does not
> occur.
>
> To investigate this issue a bit further I ran the following trivial program:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/mman.h>
>
> int main(int argc, char *argv[])
> {
> if (mlockall(MCL_CURRENT|MCL_FUTURE) == -1)
> perror(argv[0]);
>
> char command[80];
> snprintf(command, 80, "procstat -v %d", getpid());
> system(command);
>
> exit(0);
> }
>
> which yields (using CURRENT-8.0 as of today, on an Intel DP965LT board
> with a Q6600 and 8 Gbyte RAM, GENERIC kernel stripped of unused devices,
> output folded to 72 characters per line):
>
> /mltest: Resource temporarily unavailable
> PID START END PRT RES PRES REF SHD FL TP
> PATH
> 1064 0x400000 0x401000 r-x 1 0 1 0 CN vn
> /root/mlockall/mltest
> 1064 0x500000 0x501000 rw- 1 0 1 0 CN df
> 1064 0x501000 0x600000 rwx 255 0 1 0 -- df
> 1064 0x800500000 0x80052c000 r-x 44 0 64 31 CN vn
> /libexec/ld-elf.so.1
> 1064 0x80052c000 0x800534000 rw- 8 0 1 0 C- df
> 1064 0x80062b000 0x800633000 rw- 8 0 1 0 CN vn
> /libexec/ld-elf.so.1
> 1064 0x800633000 0x80063f000 rw- 12 0 1 0 C- df
> 1064 0x80063f000 0x80072e000 r-x 239 0 128 62 CN vn
> /lib/libc.so.7
> 1064 0x80072e000 0x80072f000 r-x 1 0 1 0 CN vn
> /lib/libc.so.7
> 1064 0x80072f000 0x80082f000 r-x 51 0 128 62 CN vn
> /lib/libc.so.7
> 1064 0x80082f000 0x80084f000 rw- 32 0 1 0 C- vn
> /lib/libc.so.7
> 1064 0x80084f000 0x800865000 rw- 6 0 1 0 CN df
> 1064 0x800900000 0x800965000 rw- 101 0 1 0 -- df
> 1064 0x800965000 0x800a00000 rw- 155 0 1 0 -- df
> 1064 0x7ffffffe0000 0x800000000000 rwx 3 0 1 0 C- df
>
> I have hunted down the exact location in the kernel where the call to
> mlockall() returns an error (just using printf's, debugging using
> Firewire proved not to be as trivial to set up as it was just a few
> years ago). It appears that while wiring the memory, finally vm_fault()
> is called and it bails out at line 412 of vm_fault.c. The virtual
> address of the page that the system is attempting to wire (argument
> vaddr of vm_fault()) is 0x800762000. From the procstat output above it
> appears that this in the third region backed by /lib/libc.so.7.
>
> This made me think that the issue might be somehow related to the way in
> which dynamic libraries are linked on runtime. Indeed, if above program
> is linked -statically- it does not fail. Also if the program in compiled
> and linked -dynamically- on a i386 platform and run on an amd64, it runs
> successfully.
>
> To make a long story at least a bit shorter, I found that the problem is
> in /usr/src/libexec/rtld_elf/map_object.c at line 156. Here a contiguous
> region is staked out for the code and data. For the amd64, where the
> required alignment of the segments is 1 Mbytes, this causes a region to
> be mapped that is far larger than the library file by which it is
> backed. Addresses that are not backed by the file cannot be resident and
> hence the region cannot be locked into memory. On the i386 architecture
> this problem does not occur since the alignment of the segments is just
> 4 Kbytes. I suspect that the problem also occurs at least on the sparc64
> architecture.
>
> As a first step to a possible solution you can apply the attached
> (provisional) patch, that uses an anonymous, read-only mapping to create
> the required region.
>
> The output of the above program then becomes:
>
> PID START END PRT RES PRES REF SHD FL TP
> PATH
> 1302 0x400000 0x401000 r-x 1 0 1 0 CN vn
> /root/mlockall/mltest
> 1302 0x500000 0x501000 rw- 1 0 1 0 -- df
> 1302 0x800500000 0x80052c000 r-x 44 0 8 4 CN vn
> /libexec/ld-elf.so.1
> 1302 0x80052c000 0x800534000 rw- 8 0 1 0 -- df
> 1302 0x80062b000 0x800633000 rw- 8 0 1 0 C- vn
> /libexec/ld-elf.so.1
> 1302 0x800633000 0x80063f000 rw- 12 0 1 0 -- df
> 1302 0x80063f000 0x80072e000 r-x 239 0 124 62 CN vn
> /lib/libc.so.7
> 1302 0x80072e000 0x80072f000 r-x 1 0 1 0 C- vn
> /lib/libc.so.7
> 1302 0x80072f000 0x80082f000 r-- 256 0 1 0 -- df
> 1302 0x80082f000 0x80084f000 rw- 32 0 1 0 C- vn
> /lib/libc.so.7
> 1302 0x80084f000 0x800865000 rw- 22 0 1 0 -- df
> 1302 0x7ffffffe0000 0x800000000000 rwx 32 0 1 0 -- df
>
> i.e. mlockall() does not return an error anymore.
>
> I still have the following questions:
>
> 1. Is worth the trouble to solve the mlockall() problem at all ? Should
> I file a PR ?
Yes. Do as you want, but I see no reason.
Your analisys looks correct and useful.
>
> 2. Can someone confirm that it also occurs on the other 64 bit
> architectures ?
>
> 3. It might be more elegant to use PROT_NONE instead of PROT_READ when
> just staking out the address space. Currently mlockall() returns an
> error when attempting that, so most likely mlockall() would need to be
> changed to ignore regions mapped with PROT_NONE. On the other hand, the
> pthread implementation uses PROT_NONE to create red zones on the stack
> and mlockall() apparently succeeds with threaded applications (using the
> provided patch). Any opinions/ideas/hints ?
I think that it is better to unmap the holes, instead of making some
mapping.
Please, try this patch instead.
diff --git a/libexec/rtld-elf/map_object.c b/libexec/rtld-elf/map_object.c
index 2d06074..3266af0 100644
--- a/libexec/rtld-elf/map_object.c
+++ b/libexec/rtld-elf/map_object.c
@@ -83,6 +83,7 @@ map_object(int fd, const char *path, const struct stat *sb)
Elf_Addr bss_vaddr;
Elf_Addr bss_vlimit;
caddr_t bss_addr;
+ size_t hole;
hdr = get_elf_header(fd, path);
if (hdr == NULL)
@@ -91,8 +92,7 @@ map_object(int fd, const char *path, const struct stat *sb)
/*
* Scan the program header entries, and save key information.
*
- * We rely on there being exactly two load segments, text and data,
- * in that order.
+ * We expect that the loadable segments are ordered by load address.
*/
phdr = (Elf_Phdr *) ((char *)hdr + hdr->e_phoff);
phsize = hdr->e_phnum * sizeof (phdr[0]);
@@ -214,6 +214,17 @@ map_object(int fd, const char *path, const struct stat *sb)
return NULL;
}
}
+
+ /* Unmap the region between two non-adjusted ELF segments */
+ if (i < nsegs) {
+ hole = trunc_page(segs[i + 1]->p_vaddr) - bss_vlimit;
+ if (hole > 0 && munmap(mapbase + bss_vlimit, hole) == -1) {
+ _rtld_error("%s: munmap hole failed: %s", path,
+ strerror(errno));
+ return NULL;
+ }
+ }
+
if (phdr_vaddr == 0 && data_offset <= hdr->e_phoff &&
(data_vlimit - data_vaddr + data_offset) >=
(hdr->e_phoff + hdr->e_phnum * sizeof (Elf_Phdr))) {
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090405/d2dbd338/attachment.pgp
More information about the freebsd-hackers
mailing list