svn commit: r51904 - head/en_US.ISO8859-1/articles/linux-emulation
Benedict Reuschling
bcr at FreeBSD.org
Sat Jun 23 14:55:55 UTC 2018
Author: bcr
Date: Sat Jun 23 14:55:54 2018
New Revision: 51904
URL: https://svnweb.freebsd.org/changeset/doc/51904
Log:
Style cleanup, purely cosmetical, no visual content changes:
- Wrap overly long lines
- Use two spaces after a sentence stop in a few places
Modified:
head/en_US.ISO8859-1/articles/linux-emulation/article.xml
Modified: head/en_US.ISO8859-1/articles/linux-emulation/article.xml
==============================================================================
--- head/en_US.ISO8859-1/articles/linux-emulation/article.xml Sat Jun 23 06:57:42 2018 (r51903)
+++ head/en_US.ISO8859-1/articles/linux-emulation/article.xml Sat Jun 23 14:55:54 2018 (r51904)
@@ -3,13 +3,23 @@
"http://www.FreeBSD.org/XML/share/xml/freebsd50.dtd">
<!-- $FreeBSD$ -->
<!-- The FreeBSD Documentation Project -->
-<article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
- <info><title>&linux; emulation in &os;</title>
-
+<article xmlns="http://docbook.org/ns/docbook"
+ xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0"
+ xml:lang="en">
+ <info>
+ <title>&linux; emulation in &os;</title>
- <author><personname><firstname>Roman</firstname><surname>Divacky</surname></personname><affiliation>
- <address><email>rdivacky at FreeBSD.org</email></address>
- </affiliation></author>
+ <author>
+ <personname>
+ <firstname>Roman</firstname>
+ <surname>Divacky</surname>
+ </personname>
+ <affiliation>
+ <address>
+ <email>rdivacky at FreeBSD.org</email>
+ </address>
+ </affiliation>
+ </author>
<legalnotice xml:id="trademarks" role="trademarks">
&tm-attrib.adobe;
@@ -28,151 +38,165 @@
<releaseinfo>$FreeBSD$</releaseinfo>
<abstract>
- <para>This masters thesis deals with updating the &linux; emulation layer
- (the so called <firstterm>Linuxulator</firstterm>). The task was to update the layer to match
- the functionality of &linux; 2.6. As a reference implementation, the
- &linux; 2.6.16 kernel was chosen. The concept is loosely based on
- the NetBSD implementation. Most of the work was done in the summer
- of 2006 as a part of the Google Summer of Code students program.
- The focus was on bringing the <firstterm>NPTL</firstterm> (new &posix;
- thread library) support into the emulation layer, including
- <firstterm>TLS</firstterm> (thread local storage),
+ <para>This masters thesis deals with updating the &linux;
+ emulation layer (the so called
+ <firstterm>Linuxulator</firstterm>). The task was to update
+ the layer to match the functionality of &linux; 2.6. As a
+ reference implementation, the &linux; 2.6.16 kernel was
+ chosen. The concept is loosely based on the NetBSD
+ implementation. Most of the work was done in the summer of
+ 2006 as a part of the Google Summer of Code students program.
+ The focus was on bringing the <firstterm>NPTL</firstterm> (new
+ &posix; thread library) support into the emulation layer,
+ including <firstterm>TLS</firstterm> (thread local storage),
<firstterm>futexes</firstterm> (fast user space mutexes),
<firstterm>PID mangling</firstterm>, and some other minor
things. Many small problems were identified and fixed in the
process. My work was integrated into the main &os; source
- repository and will be shipped in the upcoming 7.0R release. We,
- the emulation development team, are working on making the
- &linux; 2.6 emulation the default emulation layer in &os;.</para>
+ repository and will be shipped in the upcoming 7.0R release.
+ We, the emulation development team, are working on making the
+ &linux; 2.6 emulation the default emulation layer in
+ &os;.</para>
</abstract>
</info>
<sect1 xml:id="intro">
<title>Introduction</title>
- <para>In the last few years the open source &unix; based operating systems
- started to be widely deployed on server and client machines. Among
- these operating systems I would like to point out two: &os;, for its BSD
- heritage, time proven code base and many interesting features and
- &linux; for its wide user base, enthusiastic open developer community
- and support from large companies. &os; tends to be used on server
- class machines serving heavy duty networking tasks with less usage on
- desktop class machines for ordinary users. While &linux; has the same
- usage on servers, but it is used much more by home based users. This
- leads to a situation where there are many binary only programs available
- for &linux; that lack support for &os;.</para>
+ <para>In the last few years the open source &unix; based operating
+ systems started to be widely deployed on server and client
+ machines. Among these operating systems I would like to point
+ out two: &os;, for its BSD heritage, time proven code base and
+ many interesting features and &linux; for its wide user base,
+ enthusiastic open developer community and support from large
+ companies. &os; tends to be used on server class machines
+ serving heavy duty networking tasks with less usage on desktop
+ class machines for ordinary users. While &linux; has the same
+ usage on servers, but it is used much more by home based users.
+ This leads to a situation where there are many binary only
+ programs available for &linux; that lack support for
+ &os;.</para>
- <para>Naturally, a need for the ability to run &linux; binaries on a &os;
- system arises and this is what this thesis deals with: the emulation of
- the &linux; kernel in the &os; operating system.</para>
+ <para>Naturally, a need for the ability to run &linux; binaries on
+ a &os; system arises and this is what this thesis deals with:
+ the emulation of the &linux; kernel in the &os; operating
+ system.</para>
- <para>During the Summer of 2006 Google Inc. sponsored a project which
- focused on extending the &linux; emulation layer (the so called Linuxulator)
- in &os; to include &linux; 2.6 facilities. This thesis is written as a
- part of this project.</para>
+ <para>During the Summer of 2006 Google Inc. sponsored a project
+ which focused on extending the &linux; emulation layer (the so
+ called Linuxulator) in &os; to include &linux; 2.6 facilities.
+ This thesis is written as a part of this project.</para>
</sect1>
<sect1 xml:id="inside">
<title>A look inside…</title>
- <para>In this section we are going to describe every operating system in
- question. How they deal with syscalls, trapframes etc., all the low-level
- stuff. We also describe the way they understand common &unix;
- primitives like what a PID is, what a thread is, etc. In the third
- subsection we talk about how &unix; on &unix; emulation could be done
- in general.</para>
+ <para>In this section we are going to describe every operating
+ system in question. How they deal with syscalls, trapframes
+ etc., all the low-level stuff. We also describe the way they
+ understand common &unix; primitives like what a PID is, what a
+ thread is, etc. In the third subsection we talk about how
+ &unix; on &unix; emulation could be done in general.</para>
<sect2 xml:id="what-is-unix">
<title>What is &unix;</title>
<para>&unix; is an operating system with a long history that has
- influenced almost every other operating system currently in use.
- Starting in the 1960s, its development continues to this day (although
- in different projects). &unix; development soon forked into two main
- ways: the BSDs and System III/V families. They mutually influenced
- themselves by growing a common &unix; standard. Among the
- contributions originated in BSD we can name virtual memory, TCP/IP
- networking, FFS, and many others. The System V branch contributed to
- SysV interprocess communication primitives, copy-on-write, etc. &unix;
- itself does not exist any more but its ideas have been used by many
- other operating systems world wide thus forming the so called &unix;-like
- operating systems. These days the most influential ones are &linux;,
- Solaris, and possibly (to some extent) &os;. There are in-company
- &unix; derivatives (AIX, HP-UX etc.), but these have been more and
- more migrated to the aforementioned systems. Let us summarize typical
- &unix; characteristics.</para>
+ influenced almost every other operating system currently in
+ use. Starting in the 1960s, its development continues to this
+ day (although in different projects). &unix; development soon
+ forked into two main ways: the BSDs and System III/V families.
+ They mutually influenced themselves by growing a common &unix;
+ standard. Among the contributions originated in BSD we can
+ name virtual memory, TCP/IP networking, FFS, and many others.
+ The System V branch contributed to SysV interprocess
+ communication primitives, copy-on-write, etc. &unix; itself
+ does not exist any more but its ideas have been used by many
+ other operating systems world wide thus forming the so called
+ &unix;-like operating systems. These days the most
+ influential ones are &linux;, Solaris, and possibly (to some
+ extent) &os;. There are in-company &unix; derivatives (AIX,
+ HP-UX etc.), but these have been more and more migrated to the
+ aforementioned systems. Let us summarize typical &unix;
+ characteristics.</para>
</sect2>
<sect2 xml:id="tech-details">
<title>Technical details</title>
- <para>Every running program constitutes a process that represents a state
- of the computation. Running process is divided between kernel-space
- and user-space. Some operations can be done only from kernel space
- (dealing with hardware etc.), but the process should spend most of its
- lifetime in the user space. The kernel is where the management of the
- processes, hardware, and low-level details take place. The kernel
- provides a standard unified &unix; API to the user space. The most
- important ones are covered below.</para>
+ <para>Every running program constitutes a process that
+ represents a state of the computation. Running process is
+ divided between kernel-space and user-space. Some operations
+ can be done only from kernel space (dealing with hardware
+ etc.), but the process should spend most of its lifetime in
+ the user space. The kernel is where the management of the
+ processes, hardware, and low-level details take place. The
+ kernel provides a standard unified &unix; API to the user
+ space. The most important ones are covered below.</para>
<sect3 xml:id="kern-proc-comm">
- <title>Communication between kernel and user space process</title>
+ <title>Communication between kernel and user space
+ process</title>
- <para>Common &unix; API defines a syscall as a way to issue commands
- from a user space process to the kernel. The most common
- implementation is either by using an interrupt or specialized
- instruction (think of
- <literal>SYSENTER</literal>/<literal>SYSCALL</literal> instructions
- for ia32). Syscalls are defined by a number. For example in &os;,
- the syscall number 85 is the &man.swapon.2; syscall and the
- syscall number 132 is &man.mkfifo.2;. Some syscalls need
- parameters, which are passed from the user-space to the kernel-space
- in various ways (implementation dependant). Syscalls are
+ <para>Common &unix; API defines a syscall as a way to issue
+ commands from a user space process to the kernel. The most
+ common implementation is either by using an interrupt or
+ specialized instruction (think of
+ <literal>SYSENTER</literal>/<literal>SYSCALL</literal>
+ instructions for ia32). Syscalls are defined by a number.
+ For example in &os;, the syscall number 85 is the
+ &man.swapon.2; syscall and the syscall number 132 is
+ &man.mkfifo.2;. Some syscalls need parameters, which are
+ passed from the user-space to the kernel-space in various
+ ways (implementation dependant). Syscalls are
synchronous.</para>
<para>Another possible way to communicate is by using a
- <firstterm>trap</firstterm>. Traps occur asynchronously after
- some event occurs (division by zero, page fault etc.). A trap
- can be transparent for a process (page fault) or can result in
- a reaction like sending a <firstterm>signal</firstterm>
- (division by zero).</para>
+ <firstterm>trap</firstterm>. Traps occur asynchronously
+ after some event occurs (division by zero, page fault etc.).
+ A trap can be transparent for a process (page fault) or can
+ result in a reaction like sending a
+ <firstterm>signal</firstterm> (division by zero).</para>
</sect3>
<sect3 xml:id="proc-proc-comm">
<title>Communication between processes</title>
- <para>There are other APIs (System V IPC, shared memory etc.) but the
- single most important API is signal. Signals are sent by processes
- or by the kernel and received by processes. Some signals
- can be ignored or handled by a user supplied routine, some result
- in a predefined action that cannot be altered or ignored.</para>
+ <para>There are other APIs (System V IPC, shared memory etc.)
+ but the single most important API is signal. Signals are
+ sent by processes or by the kernel and received by
+ processes. Some signals can be ignored or handled by a user
+ supplied routine, some result in a predefined action that
+ cannot be altered or ignored.</para>
</sect3>
<sect3 xml:id="proc-mgmt">
<title>Process management</title>
- <para>Kernel instances are processed first in the system (so called
- init). Every running process can create its identical copy using
- the &man.fork.2; syscall. Some slightly modified versions of this
- syscall were introduced but the basic semantic is the same. Every
- running process can morph into some other process using the
- &man.exec.3; syscall. Some modifications of this syscall were
- introduced but all serve the same basic purpose. Processes end
- their lives by calling the &man.exit.2; syscall. Every process is
- identified by a unique number called PID. Every process has a
- defined parent (identified by its PID).</para>
+ <para>Kernel instances are processed first in the system (so
+ called init). Every running process can create its
+ identical copy using the &man.fork.2; syscall. Some
+ slightly modified versions of this syscall were introduced
+ but the basic semantic is the same. Every running process
+ can morph into some other process using the &man.exec.3;
+ syscall. Some modifications of this syscall were introduced
+ but all serve the same basic purpose. Processes end their
+ lives by calling the &man.exit.2; syscall. Every process is
+ identified by a unique number called PID. Every process has
+ a defined parent (identified by its PID).</para>
</sect3>
<sect3 xml:id="thread-mgmt">
<title>Thread management</title>
- <para>Traditional &unix; does not define any API nor implementation
- for threading, while &posix; defines its threading API but the
- implementation is undefined. Traditionally there were two ways of
- implementing threads. Handling them as separate processes (1:1
- threading) or envelope the whole thread group in one process and
- managing the threading in userspace (1:N threading). Comparing
- main features of each approach:</para>
+ <para>Traditional &unix; does not define any API nor
+ implementation for threading, while &posix; defines its
+ threading API but the implementation is undefined.
+ Traditionally there were two ways of implementing threads.
+ Handling them as separate processes (1:1 threading) or
+ envelope the whole thread group in one process and managing
+ the threading in userspace (1:N threading). Comparing main
+ features of each approach:</para>
<para>1:1 threading</para>
@@ -199,10 +223,11 @@
<para>+ lightweight threads</para>
</listitem>
<listitem>
- <para>+ scheduling can be easily altered by the user</para>
+ <para>+ scheduling can be easily altered by the
+ user</para>
</listitem>
<listitem>
- <para>- syscalls must be wrapped </para>
+ <para>- syscalls must be wrapped</para>
</listitem>
<listitem>
<para>- cannot utilize more than one CPU</para>
@@ -214,24 +239,26 @@
<sect2 xml:id="what-is-freebsd">
<title>What is &os;?</title>
- <para>The &os; project is one of the oldest open source operating
- systems currently available for daily use. It is a direct descendant
- of the genuine &unix; so it could be claimed that it is a true &unix;
- although licensing issues do not permit that. The start of the project
- dates back to the early 1990's when a crew of fellow BSD users patched
- the 386BSD operating system. Based on this patchkit a new operating
- system arose named &os; for its liberal license. Another group created
- the NetBSD operating system with different goals in mind. We will
- focus on &os;.</para>
+ <para>The &os; project is one of the oldest open source
+ operating systems currently available for daily use. It is a
+ direct descendant of the genuine &unix; so it could be claimed
+ that it is a true &unix; although licensing issues do not
+ permit that. The start of the project dates back to the early
+ 1990's when a crew of fellow BSD users patched the 386BSD
+ operating system. Based on this patchkit a new operating
+ system arose named &os; for its liberal license. Another
+ group created the NetBSD operating system with different goals
+ in mind. We will focus on &os;.</para>
- <para>&os; is a modern &unix;-based operating system with all the
- features of &unix;. Preemptive multitasking, multiuser facilities,
- TCP/IP networking, memory protection, symmetric multiprocessing
- support, virtual memory with merged VM and buffer cache, they are all
- there. One of the interesting and extremely useful features is the
- ability to emulate other &unix;-like operating systems. As of
- December 2006 and 7-CURRENT development, the following
- emulation functionalities are supported:</para>
+ <para>&os; is a modern &unix;-based operating system with all
+ the features of &unix;. Preemptive multitasking, multiuser
+ facilities, TCP/IP networking, memory protection, symmetric
+ multiprocessing support, virtual memory with merged VM and
+ buffer cache, they are all there. One of the interesting and
+ extremely useful features is the ability to emulate other
+ &unix;-like operating systems. As of December 2006 and
+ 7-CURRENT development, the following emulation functionalities
+ are supported:</para>
<itemizedlist>
<listitem>
@@ -241,10 +268,12 @@
<para>&os;/i386 emulation on &os;/ia64</para>
</listitem>
<listitem>
- <para>&linux;-emulation of &linux; operating system on &os;</para>
+ <para>&linux;-emulation of &linux; operating system on
+ &os;</para>
</listitem>
<listitem>
- <para>NDIS-emulation of Windows networking drivers interface</para>
+ <para>NDIS-emulation of Windows networking drivers
+ interface</para>
</listitem>
<listitem>
<para>NetBSD-emulation of NetBSD operating system</para>
@@ -257,62 +286,70 @@
</listitem>
</itemizedlist>
- <para>Actively developed emulations are the &linux; layer and various
- &os;-on-&os; layers. Others are not supposed to work properly nor
- be usable these days.</para>
+ <para>Actively developed emulations are the &linux; layer and
+ various &os;-on-&os; layers. Others are not supposed to work
+ properly nor be usable these days.</para>
<sect3 xml:id="freebsd-tech-details">
<title>Technical details</title>
- <para>&os; is traditional flavor of &unix; in the sense of dividing the
- run of processes into two halves: kernel space and user space run.
- There are two types of process entry to the kernel: a syscall and a
- trap. There is only one way to return. In the subsequent sections
- we will describe the three gates to/from the kernel. The whole
- description applies to the i386 architecture as the Linuxulator
- only exists there but the concept is similar on other architectures.
- The information was taken from [1] and the source code.</para>
+ <para>&os; is traditional flavor of &unix; in the sense of
+ dividing the run of processes into two halves: kernel space
+ and user space run. There are two types of process entry to
+ the kernel: a syscall and a trap. There is only one way to
+ return. In the subsequent sections we will describe the
+ three gates to/from the kernel. The whole description
+ applies to the i386 architecture as the Linuxulator only
+ exists there but the concept is similar on other
+ architectures. The information was taken from [1] and the
+ source code.</para>
<sect4 xml:id="freebsd-sys-entries">
<title>System entries</title>
- <para>&os; has an abstraction called an execution class loader,
- which is a wedge into the &man.execve.2; syscall. This employs a
- structure <literal>sysentvec</literal>, which describes an
- executable ABI. It contains things like errno translation table,
- signal translation table, various functions to serve syscall needs
- (stack fixup, coredumping, etc.). Every ABI the &os; kernel wants
- to support must define this structure, as it is used later in the
- syscall processing code and at some other places. System entries
- are handled by trap handlers, where we can access both the
- kernel-space and the user-space at once.</para>
+ <para>&os; has an abstraction called an execution class
+ loader, which is a wedge into the &man.execve.2; syscall.
+ This employs a structure <literal>sysentvec</literal>,
+ which describes an executable ABI. It contains things
+ like errno translation table, signal translation table,
+ various functions to serve syscall needs (stack fixup,
+ coredumping, etc.). Every ABI the &os; kernel wants to
+ support must define this structure, as it is used later in
+ the syscall processing code and at some other places.
+ System entries are handled by trap handlers, where we can
+ access both the kernel-space and the user-space at
+ once.</para>
</sect4>
<sect4 xml:id="freebsd-syscalls">
<title>Syscalls</title>
<para>Syscalls on &os; are issued by executing interrupt
- <literal>0x80</literal> with register <varname>%eax</varname> set
- to a desired syscall number with arguments passed on the stack.</para>
+ <literal>0x80</literal> with register
+ <varname>%eax</varname> set to a desired syscall number
+ with arguments passed on the stack.</para>
- <para>When a process issues an interrupt <literal>0x80</literal>, the
- <literal>int0x80</literal> syscall trap handler is issued (defined
- in <filename>sys/i386/i386/exception.s</filename>), which prepares
- arguments (i.e. copies them on to the stack) for a
- call to a C function &man.syscall.2; (defined in
- <filename>sys/i386/i386/trap.c</filename>), which processes the
- passed in trapframe. The processing consists of preparing the
- syscall (depending on the <literal>sysvec</literal> entry),
- determining if the syscall is 32-bit or 64-bit one (changes size
- of the parameters), then the parameters are copied, including the
- syscall. Next, the actual syscall function is executed with
- processing of the return code (special cases for
- <literal>ERESTART</literal> and <literal>EJUSTRETURN</literal>
- errors). Finally an <literal>userret()</literal> is scheduled,
- switching the process back to the users-pace. The parameters to
- the actual syscall handler are passed in the form of
- <literal>struct thread *td</literal>,
- <literal>struct syscall args *</literal> arguments where the second
+ <para>When a process issues an interrupt
+ <literal>0x80</literal>, the <literal>int0x80</literal>
+ syscall trap handler is issued (defined in
+ <filename>sys/i386/i386/exception.s</filename>), which
+ prepares arguments (i.e. copies them on to the stack) for
+ a call to a C function &man.syscall.2; (defined in
+ <filename>sys/i386/i386/trap.c</filename>), which
+ processes the passed in trapframe. The processing
+ consists of preparing the syscall (depending on the
+ <literal>sysvec</literal> entry), determining if the
+ syscall is 32-bit or 64-bit one (changes size of the
+ parameters), then the parameters are copied, including the
+ syscall. Next, the actual syscall function is executed
+ with processing of the return code (special cases for
+ <literal>ERESTART</literal> and
+ <literal>EJUSTRETURN</literal> errors). Finally an
+ <literal>userret()</literal> is scheduled, switching the
+ process back to the users-pace. The parameters to the
+ actual syscall handler are passed in the form of
+ <literal>struct thread *td</literal>, <literal>struct
+ syscall args *</literal> arguments where the second
parameter is a pointer to the copied in structure of
parameters.</para>
</sect4>
@@ -320,68 +357,76 @@
<sect4 xml:id="freebsd-traps">
<title>Traps</title>
- <para>Handling of traps in &os; is similar to the handling of
- syscalls. Whenever a trap occurs, an assembler handler is called.
- It is chosen between alltraps, alltraps with regs pushed or
- calltrap depending on the type of the trap. This handler prepares
- arguments for a call to a C function <literal>trap()</literal>
- (defined in <filename>sys/i386/i386/trap.c</filename>), which then
- processes the occurred trap. After the processing it might send a
- signal to the process and/or exit to userland using
- <literal>userret()</literal>.</para>
+ <para>Handling of traps in &os; is similar to the handling
+ of syscalls. Whenever a trap occurs, an assembler handler
+ is called. It is chosen between alltraps, alltraps with
+ regs pushed or calltrap depending on the type of the trap.
+ This handler prepares arguments for a call to a C function
+ <literal>trap()</literal> (defined in
+ <filename>sys/i386/i386/trap.c</filename>), which then
+ processes the occurred trap. After the processing it
+ might send a signal to the process and/or exit to userland
+ using <literal>userret()</literal>.</para>
</sect4>
<sect4 xml:id="freebsd-exits">
<title>Exits</title>
- <para>Exits from kernel to userspace happen using the assembler
- routine <literal>doreti</literal> regardless of whether the kernel
- was entered via a trap or via a syscall. This restores the program
- status from the stack and returns to the userspace.</para>
+ <para>Exits from kernel to userspace happen using the
+ assembler routine <literal>doreti</literal> regardless of
+ whether the kernel was entered via a trap or via a
+ syscall. This restores the program status from the stack
+ and returns to the userspace.</para>
</sect4>
<sect4 xml:id="freebsd-unix-primitives">
<title>&unix; primitives</title>
- <para>&os; operating system adheres to the traditional &unix; scheme,
- where every process has a unique identification number, the so
- called <firstterm>PID</firstterm> (Process ID). PID numbers are
+ <para>&os; operating system adheres to the traditional
+ &unix; scheme, where every process has a unique
+ identification number, the so called
+ <firstterm>PID</firstterm> (Process ID). PID numbers are
allocated either linearly or randomly ranging from
- <literal>0</literal> to <literal>PID_MAX</literal>. The allocation
- of PID numbers is done using linear searching of PID space. Every
- thread in a process receives the same PID number as result of the
- &man.getpid.2; call.</para>
+ <literal>0</literal> to <literal>PID_MAX</literal>. The
+ allocation of PID numbers is done using linear searching
+ of PID space. Every thread in a process receives the same
+ PID number as result of the &man.getpid.2; call.</para>
- <para>There are currently two ways to implement threading in &os;.
- The first way is M:N threading followed by the 1:1 threading model.
- The default library used is M:N threading
- (<literal>libpthread</literal>) and you can switch at runtime to
- 1:1 threading (<literal>libthr</literal>). The plan is to switch
- to 1:1 library by default soon. Although those two libraries use
- the same kernel primitives, they are accessed through different
- API(es). The M:N library uses the <literal>kse_*</literal> family
- of syscalls while the 1:1 library uses the <literal>thr_*</literal>
- family of syscalls. Because of this, there is no general concept
- of thread ID shared between kernel and userspace. Of course, both
- threading libraries implement the pthread thread ID API. Every
- kernel thread (as described by <literal>struct thread</literal>)
- has td tid identifier but this is not directly accessible
- from userland and solely serves the kernel's needs. It is also
- used for 1:1 threading library as pthread's thread ID but handling
- of this is internal to the library and cannot be relied on.</para>
+ <para>There are currently two ways to implement threading in
+ &os;. The first way is M:N threading followed by the 1:1
+ threading model. The default library used is M:N
+ threading (<literal>libpthread</literal>) and you can
+ switch at runtime to 1:1 threading
+ (<literal>libthr</literal>). The plan is to switch to 1:1
+ library by default soon. Although those two libraries use
+ the same kernel primitives, they are accessed through
+ different API(es). The M:N library uses the
+ <literal>kse_*</literal> family of syscalls while the 1:1
+ library uses the <literal>thr_*</literal> family of
+ syscalls. Because of this, there is no general concept of
+ thread ID shared between kernel and userspace. Of course,
+ both threading libraries implement the pthread thread ID
+ API. Every kernel thread (as described by <literal>struct
+ thread</literal>) has td tid identifier but this is not
+ directly accessible from userland and solely serves the
+ kernel's needs. It is also used for 1:1 threading library
+ as pthread's thread ID but handling of this is internal to
+ the library and cannot be relied on.</para>
- <para>As stated previously there are two implementations of threading
- in &os;. The M:N library divides the work between kernel space and
- userspace. Thread is an entity that gets scheduled in the kernel
- but it can represent various number of userspace threads.
- M userspace threads get mapped to N kernel threads thus saving
- resources while keeping the ability to exploit multiprocessor
- parallelism. Further information about the implementation can be
- obtained from the man page or [1]. The 1:1 library directly maps a
- userland thread to a kernel thread thus greatly simplifying the
- scheme. None of these designs implement a fairness mechanism (such
- a mechanism was implemented but it was removed recently because it
- caused serious slowdown and made the code more difficult to deal
+ <para>As stated previously there are two implementations of
+ threading in &os;. The M:N library divides the work
+ between kernel space and userspace. Thread is an entity
+ that gets scheduled in the kernel but it can represent
+ various number of userspace threads. M userspace threads
+ get mapped to N kernel threads thus saving resources while
+ keeping the ability to exploit multiprocessor parallelism.
+ Further information about the implementation can be
+ obtained from the man page or [1]. The 1:1 library
+ directly maps a userland thread to a kernel thread thus
+ greatly simplifying the scheme. None of these designs
+ implement a fairness mechanism (such a mechanism was
+ implemented but it was removed recently because it caused
+ serious slowdown and made the code more difficult to deal
with).</para>
</sect4>
</sect3>
@@ -390,64 +435,70 @@
<sect2 xml:id="what-is-linux">
<title>What is &linux;</title>
- <para>&linux; is a &unix;-like kernel originally developed by Linus
- Torvalds, and now being contributed to by a massive crowd of
- programmers all around the world. From its mere beginnings to today,
- with wide support from companies such as IBM or Google, &linux; is
- being associated with its fast development pace, full hardware support
- and benevolent dictator model of organization.</para>
+ <para>&linux; is a &unix;-like kernel originally developed by
+ Linus Torvalds, and now being contributed to by a massive
+ crowd of programmers all around the world. From its mere
+ beginnings to today, with wide support from companies such as
+ IBM or Google, &linux; is being associated with its fast
+ development pace, full hardware support and benevolent
+ dictator model of organization.</para>
- <para>&linux; development started in 1991 as a hobbyist project at
- University of Helsinki in Finland. Since then it has obtained all the
- features of a modern &unix;-like OS: multiprocessing, multiuser
- support, virtual memory, networking, basically everything is there.
- There are also highly advanced features like virtualization etc.</para>
+ <para>&linux; development started in 1991 as a hobbyist project
+ at University of Helsinki in Finland. Since then it has
+ obtained all the features of a modern &unix;-like OS:
+ multiprocessing, multiuser support, virtual memory,
+ networking, basically everything is there. There are also
+ highly advanced features like virtualization etc.</para>
- <para>As of 2006 &linux; seems to be the most widely used open source
- operating system with support from independent software vendors like
- Oracle, RealNetworks, Adobe, etc. Most of the commercial software
- distributed for &linux; can only be obtained in a binary form so
- recompilation for other operating systems is impossible.</para>
+ <para>As of 2006 &linux; seems to be the most widely used open
+ source operating system with support from independent software
+ vendors like Oracle, RealNetworks, Adobe, etc. Most of the
+ commercial software distributed for &linux; can only be
+ obtained in a binary form so recompilation for other operating
+ systems is impossible.</para>
<para>Most of the &linux; development happens in a
<application>Git</application> version control system.
- <application>Git</application> is a distributed system so there is
- no central source of the &linux; code, but some branches are considered
- prominent and official. The version number scheme implemented by
- &linux; consists of four numbers A.B.C.D. Currently development
- happens in 2.6.C.D, where C represents major version, where new
- features are added or changed while D is a minor version for bugfixes
- only.</para>
+ <application>Git</application> is a distributed system so
+ there is no central source of the &linux; code, but some
+ branches are considered prominent and official. The version
+ number scheme implemented by &linux; consists of four numbers
+ A.B.C.D. Currently development happens in 2.6.C.D, where C
+ represents major version, where new features are added or
+ changed while D is a minor version for bugfixes only.</para>
<para>More information can be obtained from [3].</para>
<sect3 xml:id="linux-tech-details">
<title>Technical details</title>
- <para>&linux; follows the traditional &unix; scheme of dividing the run
- of a process in two halves: the kernel and user space. The kernel can
- be entered in two ways: via a trap or via a syscall. The return is
- handled only in one way. The further description applies to
- &linux; 2.6 on the &i386; architecture. This information was
- taken from [2].</para>
+ <para>&linux; follows the traditional &unix; scheme of
+ dividing the run of a process in two halves: the kernel and
+ user space. The kernel can be entered in two ways: via a
+ trap or via a syscall. The return is handled only in one
+ way. The further description applies to &linux; 2.6 on
+ the &i386; architecture. This information was taken from
+ [2].</para>
<sect4 xml:id="linux-syscalls">
<title>Syscalls</title>
<para>Syscalls in &linux; are performed (in userspace) using
- <literal>syscallX</literal> macros where X substitutes a number
- representing the number of parameters of the given syscall. This
- macro translates to a code that loads <varname>%eax</varname>
- register with a number of the syscall and executes interrupt
- <literal>0x80</literal>. After this syscall return is called,
- which translates negative return values to positive
- <literal>errno</literal> values and sets <literal>res</literal> to
- <literal>-1</literal> in case of an error. Whenever the interrupt
- <literal>0x80</literal> is called the process enters the kernel in
- system call trap handler. This routine saves all registers on the
- stack and calls the selected syscall entry. Note that the &linux;
- calling convention expects parameters to the syscall to be passed
- via registers as shown here:</para>
+ <literal>syscallX</literal> macros where X substitutes a
+ number representing the number of parameters of the given
+ syscall. This macro translates to a code that loads
+ <varname>%eax</varname> register with a number of the
+ syscall and executes interrupt <literal>0x80</literal>.
+ After this syscall return is called, which translates
+ negative return values to positive
+ <literal>errno</literal> values and sets
+ <literal>res</literal> to <literal>-1</literal> in case of
+ an error. Whenever the interrupt <literal>0x80</literal>
+ is called the process enters the kernel in system call
+ trap handler. This routine saves all registers on the
+ stack and calls the selected syscall entry. Note that the
+ &linux; calling convention expects parameters to the
+ syscall to be passed via registers as shown here:</para>
<orderedlist>
<listitem>
@@ -470,53 +521,58 @@
</listitem>
</orderedlist>
- <para>There are some exceptions to this, where &linux; uses different
- calling convention (most notably the <literal>clone</literal>
- syscall).</para>
+ <para>There are some exceptions to this, where &linux; uses
+ different calling convention (most notably the
+ <literal>clone</literal> syscall).</para>
</sect4>
<sect4 xml:id="linux-traps">
<title>Traps</title>
<para>The trap handlers are introduced in
- <filename>arch/i386/kernel/traps.c</filename> and most of these
- handlers live in <filename>arch/i386/kernel/entry.S</filename>,
- where handling of the traps happens.</para>
+ <filename>arch/i386/kernel/traps.c</filename> and most of
+ these handlers live in
+ <filename>arch/i386/kernel/entry.S</filename>, where
+ handling of the traps happens.</para>
</sect4>
<sect4 xml:id="linux-exits">
<title>Exits</title>
- <para>Return from the syscall is managed by syscall &man.exit.3;,
- which checks for the process having unfinished work, then checks
- whether we used user-supplied selectors. If this happens stack
- fixing is applied and finally the registers are restored from the
- stack and the process returns to the userspace.</para>
+ <para>Return from the syscall is managed by syscall
+ &man.exit.3;, which checks for the process having
+ unfinished work, then checks whether we used user-supplied
+ selectors. If this happens stack fixing is applied and
+ finally the registers are restored from the stack and the
+ process returns to the userspace.</para>
</sect4>
<sect4 xml:id="linux-unix-primitives">
<title>&unix; primitives</title>
- <para>In the 2.6 version, the &linux; operating system redefined some
- of the traditional &unix; primitives, notably PID, TID and thread.
- PID is defined not to be unique for every process, so for some
- processes (threads) &man.getppid.2; returns the same value. Unique
- identification of process is provided by TID. This is because
- <firstterm>NPTL</firstterm> (New &posix; Thread Library) defines
- threads to be normal processes (so called 1:1 threading). Spawning
- a new process in &linux; 2.6 happens using the
- <literal>clone</literal> syscall (fork variants are reimplemented using
- it). This clone syscall defines a set of flags that affect
- behavior of the cloning process regarding thread implementation.
- The semantic is a bit fuzzy as there is no single flag telling the
- syscall to create a thread.</para>
+ <para>In the 2.6 version, the &linux; operating system
+ redefined some of the traditional &unix; primitives,
+ notably PID, TID and thread. PID is defined not to be
+ unique for every process, so for some processes (threads)
+ &man.getppid.2; returns the same value. Unique
+ identification of process is provided by TID. This is
+ because <firstterm>NPTL</firstterm> (New &posix; Thread
+ Library) defines threads to be normal processes (so called
+ 1:1 threading). Spawning a new process in
+ &linux; 2.6 happens using the
+ <literal>clone</literal> syscall (fork variants are
+ reimplemented using it). This clone syscall defines a set
+ of flags that affect behavior of the cloning process
+ regarding thread implementation. The semantic is a bit
+ fuzzy as there is no single flag telling the syscall to
+ create a thread.</para>
<para>Implemented clone flags are:</para>
<itemizedlist>
<listitem>
- <para><literal>CLONE_VM</literal> - processes share their memory
- space</para>
+ <para><literal>CLONE_VM</literal> - processes share
+ their memory space</para>
</listitem>
<listitem>
<para><literal>CLONE_FS</literal> - share umask, cwd and
@@ -527,72 +583,78 @@
files</para>
</listitem>
<listitem>
- <para><literal>CLONE_SIGHAND</literal> - share signal handlers
- and blocked signals</para>
+ <para><literal>CLONE_SIGHAND</literal> - share signal
+ handlers and blocked signals</para>
</listitem>
<listitem>
- <para><literal>CLONE_PARENT</literal> - share parent</para>
+ <para><literal>CLONE_PARENT</literal> - share
+ parent</para>
</listitem>
<listitem>
- <para><literal>CLONE_THREAD</literal> - be thread (further
- explanation below)</para>
+ <para><literal>CLONE_THREAD</literal> - be thread
+ (further explanation below)</para>
</listitem>
<listitem>
- <para><literal>CLONE_NEWNS</literal> - new namespace</para>
+ <para><literal>CLONE_NEWNS</literal> - new
+ namespace</para>
</listitem>
<listitem>
<para><literal>CLONE_SYSVSEM</literal> - share SysV undo
structures</para>
</listitem>
<listitem>
- <para><literal>CLONE_SETTLS</literal> - setup TLS at supplied
- address</para>
+ <para><literal>CLONE_SETTLS</literal> - setup TLS at
+ supplied address</para>
</listitem>
<listitem>
- <para><literal>CLONE_PARENT_SETTID</literal> - set TID in the
- parent</para>
+ <para><literal>CLONE_PARENT_SETTID</literal> - set TID
+ in the parent</para>
</listitem>
<listitem>
- <para><literal>CLONE_CHILD_CLEARTID</literal> - clear TID in the
- child</para>
+ <para><literal>CLONE_CHILD_CLEARTID</literal> - clear
+ TID in the child</para>
</listitem>
<listitem>
- <para><literal>CLONE_CHILD_SETTID</literal> - set TID in the
- child</para>
+ <para><literal>CLONE_CHILD_SETTID</literal> - set TID in
+ the child</para>
</listitem>
</itemizedlist>
- <para><literal>CLONE_PARENT</literal> sets the real parent to the
- parent of the caller. This is useful for threads because if thread
- A creates thread B we want thread B to be parented to the parent of
- the whole thread group. <literal>CLONE_THREAD</literal> does
- exactly the same thing as <literal>CLONE_PARENT</literal>,
- <literal>CLONE_VM</literal> and <literal>CLONE_SIGHAND</literal>,
- rewrites PID to be the same as PID of the caller, sets exit signal
- to be none and enters the thread group.
- <literal>CLONE_SETTLS</literal> sets up GDT entries for TLS
- handling. The <literal>CLONE_*_*TID</literal> set of flags
- sets/clears user supplied address to TID or 0.</para>
+ <para><literal>CLONE_PARENT</literal> sets the real parent
+ to the parent of the caller. This is useful for threads
+ because if thread A creates thread B we want thread B to
+ be parented to the parent of the whole thread group.
+ <literal>CLONE_THREAD</literal> does exactly the same
+ thing as <literal>CLONE_PARENT</literal>,
+ <literal>CLONE_VM</literal> and
+ <literal>CLONE_SIGHAND</literal>, rewrites PID to be the
+ same as PID of the caller, sets exit signal to be none and
+ enters the thread group. <literal>CLONE_SETTLS</literal>
+ sets up GDT entries for TLS handling. The
+ <literal>CLONE_*_*TID</literal> set of flags sets/clears
+ user supplied address to TID or 0.</para>
- <para>As you can see the <literal>CLONE_THREAD</literal> does most
- of the work and does not seem to fit the scheme very well. The
- original intention is unclear (even for authors, according to
- comments in the code) but I think originally there was one
- threading flag, which was then parcelled among many other flags
- but this separation was never fully finished. It is also unclear
- what this partition is good for as glibc does not use that so only
- hand-written use of the clone permits a programmer to access this
- features.</para>
+ <para>As you can see the <literal>CLONE_THREAD</literal>
+ does most of the work and does not seem to fit the scheme
+ very well. The original intention is unclear (even for
+ authors, according to comments in the code) but I think
+ originally there was one threading flag, which was then
+ parcelled among many other flags but this separation was
+ never fully finished. It is also unclear what this
+ partition is good for as glibc does not use that so only
+ hand-written use of the clone permits a programmer to
+ access this features.</para>
- <para>For non-threaded programs the PID and TID are the same. For
- threaded programs the first thread PID and TID are the same and
- every created thread shares the same PID and gets assigned a
- unique TID (because <literal>CLONE_THREAD</literal> is passed in)
- also parent is shared for all processes forming this threaded
+ <para>For non-threaded programs the PID and TID are the
+ same. For threaded programs the first thread PID and TID
+ are the same and every created thread shares the same PID
+ and gets assigned a unique TID (because
+ <literal>CLONE_THREAD</literal> is passed in) also parent
+ is shared for all processes forming this threaded
program.</para>
- <para>The code that implements &man.pthread.create.3; in NPTL defines
- the clone flags like this:</para>
+ <para>The code that implements &man.pthread.create.3; in
+ NPTL defines the clone flags like this:</para>
<programlisting>int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGNAL
@@ -606,12 +668,13 @@
| 0);</programlisting>
- <para>The <literal>CLONE_SIGNAL</literal> is defined like</para>
+ <para>The <literal>CLONE_SIGNAL</literal> is defined
+ like</para>
<programlisting>#define CLONE_SIGNAL (CLONE_SIGHAND | CLONE_THREAD)</programlisting>
- <para>the last 0 means no signal is sent when any of the threads
- exits.</para>
+ <para>the last 0 means no signal is sent when any of the
+ threads exits.</para>
</sect4>
</sect3>
</sect2>
@@ -619,71 +682,80 @@
<sect2 xml:id="what-is-emu">
<title>What is emulation</title>
- <para>According to a dictionary definition, emulation is the ability of
- a program or device to imitate another program or device. This is
- achieved by providing the same reaction to a given stimulus as the
- emulated object. In practice, the software world mostly sees three
- types of emulation - a program used to emulate a machine (QEMU, various
- game console emulators etc.), software emulation of a hardware facility
- (OpenGL emulators, floating point units emulation etc.) and operating
- system emulation (either in kernel of the operating system or as a
- userspace program).</para>
+ <para>According to a dictionary definition, emulation is the
+ ability of a program or device to imitate another program or
+ device. This is achieved by providing the same reaction to a
+ given stimulus as the emulated object. In practice, the
+ software world mostly sees three types of emulation - a
+ program used to emulate a machine (QEMU, various game console
+ emulators etc.), software emulation of a hardware facility
+ (OpenGL emulators, floating point units emulation etc.) and
+ operating system emulation (either in kernel of the operating
+ system or as a userspace program).</para>
- <para>Emulation is usually used in a place, where using the original
- component is not feasible nor possible at all. For example someone
- might want to use a program developed for a different operating
- system than they use. Then emulation comes in handy. Sometimes
- there is no other way but to use emulation - e.g. when the hardware
- device you try to use does not exist (yet/anymore) then there is no
- other way but emulation. This happens often when porting an operating
+ <para>Emulation is usually used in a place, where using the
+ original component is not feasible nor possible at all. For
+ example someone might want to use a program developed for a
+ different operating system than they use. Then emulation
+ comes in handy. Sometimes there is no other way but to use
+ emulation - e.g. when the hardware device you try to use does
+ not exist (yet/anymore) then there is no other way but
+ emulation. This happens often when porting an operating
system to a new (non-existent) platform. Sometimes it is just
cheaper to emulate.</para>
- <para>Looking from an implementation point of view, there are two main
- approaches to the implementation of emulation. You can either emulate
- the whole thing - accepting possible inputs of the original object,
- maintaining inner state and emitting correct output based on the state
- and/or input. This kind of emulation does not require any special
- conditions and basically can be implemented anywhere for any
- device/program. The drawback is that implementing such emulation is
- quite difficult, time-consuming and error-prone. In some cases we can
- use a simpler approach. Imagine you want to emulate a printer that
- prints from left to right on a printer that prints from right to left.
- It is obvious that there is no need for a complex emulation layer but
- simply reversing of the printed text is sufficient. Sometimes the
- emulating environment is very similar to the emulated one so just a
- thin layer of some translation is necessary to provide fully working
- emulation! As you can see this is much less demanding to implement,
- so less time-consuming and error-prone than the previous approach. But
- the necessary condition is that the two environments must be similar
*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
More information about the svn-doc-head
mailing list