ruby-x.github.io/synthesis/ch4.html
2015-08-08 20:33:42 +03:00

412 lines
42 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN">
<html>
<head>
<title>Synthesis: An Efficient Implementation of Fundamental Operating System Services - Abstract</title>
<link rel="stylesheet" type="text/css" href="../css/style.css">
<link rel="stylesheet" type="text/css" href="style.css">
</head>
<body>
<div id="nav">
<a class=home href="../index.html">Alexia's Home</a>
<a href="index.html">Dissertation</a>
<a href="abs.html">Abstract</a>
<a href="ack.html">Acknowledgements</a>
<a href="toc.html">Contents</a>
<a href="ch1.html">Chapter 1</a>
<a href="ch2.html">Chapter 2</a>
<a href="ch3.html">Chapter 3</a>
<a class=here href="ch4.html">Chapter 4</a>
<a href="ch5.html">Chapter 5</a>
<a href="ch6.html">Chapter 6</a>
<a href="ch7.html">Chapter 7</a>
<a href="ch8.html">Chapter 8</a>
<a href="bib.html">Bibliography</a>
<a href="app-A.html">Appendix A</a>
</div>
<div id="running-title">
Synthesis: An Efficient Implementation of Fundamental Operating System Services - Abstract
</div>
<div id="content">
<h1>4. Kernel Structure</h1>
<div id="chapter-quote">
All things should be made as simple as possible, but no simpler.<br>
-- Albert Einstein
</div>
<h2>4.1 Quajects</h2>
<p><em>Quajects</em> are the building blocks out of which all Synthesis kernel services are composed. The name is derived from the term "object" of Object-Oriented (O-O) systems, which they strongly resemble [32]. The similarity is strong, but the difference is significant. Like objects, quajects encapsulate data and provide a well-defined interface to access it. Unlike objects, quajects use a code-synthesis implementation to achieve high performance, but lack high-level language support and inheritance.
<p>Kernel quajects can be broadly classified into four kinds: thread, memory, I/O, and device. Thread quajects encapsulate the unit of execution, memory quajects the unit of data storage, I/O quajects the unit of data movement, and device quajects the machine's interfaces to the outside world. Each kind of quaject is defined and implemented independently.
<p>Basic quajects implement fundamental services that cannot be had through any combination of other quajects. Threads and queues are two examples of basic quajects;
<table class=table>
<caption>
Table 4.1: List of Basic Quajects
</caption>
<tr class=head><th>Name<th>Purpose
<tr><th>Thread<td>Implements threads
<tr><th>Queue<td>Implements FIFO queues
<tr><th>Buffer<td>Data buffering
<tr><th>Dcache<td>Data caching (e.g., for disks)
<tr><th>FSmap<td>File to flat storage mapping
<tr><th>Clock<td>The system clock
<tr><th>CookTTYin<td>Keyboard input editor
<tr><th>CookTTYout<td>Output editor and format conversion
<tr><th>VT-100<td>Emulates DEC's VT100 terminal
<tr><th>Twindow<td>Text display window
<tr><th>Gwindow<td>Graphics (bit-mapped) display window
<tr><th>Probe<td>Measurements and statistics gathering
<tr><th>Sytab<td>Symbol table (associative mapping)
</table>
<p>Table 4.1 contains a list of the basic quajects in Synthesis. More complex kernel services are built out of the basic quajects by composition. For example, the Synthesis kernel has no pre-defined notion of a "process." But a <span class=smallcaps>Unix</span>-like process can be created by instantiating a thread quaject, a memory quaject, some I/O quajects, and interconnecting them in a particular way.
<h3>4.1.1 Quaject Interfaces</h3>
<p>The interface to a quaject consists of callentries, callbacks, and callouts. A client uses the services of a quaject by calling a callentry. Normally a callentry invocation simply returns. Exceptional situations return along callbacks. Callouts are places in the quaject where external calls to other quaject's callentries happen. Tables 4.2, 4.3, and 4.4 list the interfaces to the Synthesis basic kernel quajects.
<!-- - - FINISH - THIS SHOULD BE A FIGURE - - -->
<div class=code>
<pre>
+-----------+--------------------+-----------+
| Qput | | Qget |
+-----------+ ---+--+--+ +-----------+
| Qfull | o o | | | | Qempty |
+-----------+ ---+--+--+ +-----------+
| Qnotfull | | Qnotempty |
+-----------+--------------------+-----------+
</pre>
<p class=caption>Figure 4.1: Queue Quaject</p>
</div>
<p>Callentries are analogous to methods in object-oriented systems. The other two, callbacks and callouts, have no direct analogue in object-oriented systems. Conceptually, a callout is a function pointer that has been initialized to point to another quaject's callentry; callbacks point back to the invoker. Callouts are an important part of the interface because they specify what type of external call is needed, making it possible to dynamically link one of several different quaject's callentries to a particular callout, so long as the type matches. For example, the Synthesis buffer quaject has a flush callout which is invoked when the buffer is full. This enables the same buffer implementation to be used throughout the kernel simply be instantiating a buffer quaject and linking its flush callout to whatever downstream processing is appropriate for the instance.
<p>The quaject interface is better illustrated using a simple quaject as an example - the FIFO queue, shown in Figure 4.1. The Synthesis kernel supports four different types of queues, to optimize for the varying synchronization needs of different combinations of single or multiple producers and consumers (synchronization is discussed in Chapter 5). All four types support the same abstract type [6], defined by two callentry references, <em>Qput</em> and <em>Qget</em>, which put and get elements of the queue. Both these callentry references return synchronously under the normal condition (successful insertion or deletion). Under other conditions, the queue returns through the callbacks.
<p>The queue has four callbacks which are used to return queue-full and queue-empty conditions back to caller. <em>Qempty</em> is invoked when a <em>Qget</em> fails because the queue is empty. <em>Qfull</em> is invoked when a <em>Qput</em> fails because the queue is full. <em>Qnotempty</em> is called after a previous <em>Qget</em> had failed and then an element was inserted. And <em>Qnotful</em> is called after a previous <em>Qput</em> had failed and then an element was deleted. The idea is: instead of returning a condition code for interpretation by the invoker, the queue quaject directly calls the appropriate handling routines supplied by the invoker, speeding execution by eliminating the interpretation of return status codes.
<table class=table>
<caption>
Table 4.2: Interface to I/O Quajects
</caption>
<tr class=head><th>Quaject<th>Interface<th>Name<th>Purpose
<tr><td rowspan=6>Queue<td rowspan=2>Callentry<td><em>Qput</em><td>Insert element into queue
<tr><td><em>Qget</em><td>Remove element from queue
<tr><td rowspan=4>Callback<td><em>Qfull</em><td>Notify that the queue is full
<tr><td><em>Qnotful</em><td>Notify that the queue is no longer full
<tr><td><em>Qempty</em><td>Notify that the queue is empty
<tr><td><em>Qnotempty</em><td>Notify that the queue is no longer empty
<tr><td rowspan=4>BufferOut<td rowspan=3>Callentry<td><em>put</em><td>Insert an element into the buffer
<tr><td><em>write</em><td>Insert a string of elements into the buffer
<tr><td><em>flush</em><td>Force buffer contents to output
<tr><td rowspan=1>Callout<td><em>flush</em><td>Dump out the full buffer
<tr><td rowspan=3>BufferIn<td rowspan=2>Callentry<td><em>get</em><td>Get a single element from the buffer
<tr><td><em>read</em><td>Get a string of elements from the buffer
<tr><td rowspan=1>Callout<td><em>fill</em><td>Replenish the empty buffer
<tr><td rowspan=4>CookTTYin<td rowspan=2>Callentry<td><em>getchar</em><td>Read a processed character from the edit buffer
<tr><td><em>read</em><td>Read a string of characters from the edit buffer
<tr><td rowspan=2>Callout<td><em>raw_get</em><td>Get new characters from user's keyboard
<tr><td><em>echo</em><td>Echo user's typed characters
<tr><td rowspan=3>CookTTYout<td rowspan=2>Callentry<td><em>putchar</em><td>Send a character out for processing
<tr><td><em>write</em><td>Send a string of characters out for processing
<tr><td rowspan=1>Callout<td><em>raw_write</em><td>Write out processed characters to display
<tr><td rowspan=4>VT100<td rowspan=4>Callentry<td><em>putchar</em><td>Write a character to the virtual VT-100 screen
<tr><td><em>write</em><td>Write a string of characters
<tr><td><em>update</em><td>Propagate changes to the virtual screen image
<tr><td><em>refresh</em><td>Propagate the entire virtual screen image
<tr><td rowspan=4>FSmap<td rowspan=2>Callentry<td><em>aread</em><td>Asynchronous read from file
<tr><td><em>awrite</em><td>Asynchronous write to file
<tr><td rowspan=2>Callout<td><em>ca_read</em><td>Read from disk cache
<tr><td><em>ca_write</em><td>Write to disk cache
<tr><td rowspan=4>Dcache<td rowspan=2>Callentry<td><em>read</em><td>Read from data cache
<tr><td><em>write</em><td>Write to data cache
<tr><td rowspan=2>Callout<td><em>bk_read</em><td>Read from backing store
<tr><td><em>bk_write</em><td>Write to backing store
<tr><td rowspan=1>T_window<td rowspan=1>Callentry<td><em>write</em><td>Write a string of (character,attribute) pairs
<tr><td rowspan=1>G_window<td rowspan=1>Callentry<td><em>blit</em><td>Copy a rectangular array of pixels to window
</table>
<table class=table>
<caption>
Table 4.3: Interface to other Kernel Quajects
</caption>
<tr class=head><th>Quaject<th>Interface<th>Name<th>Purpose
<tr><td rowspan=11>Thread<td rowspan=8>Callentry<td><em>suspend</em><td>Suspends thread execution
<tr><td><em>resume</em><td>Resumes thread execution
<tr><td><em>stop</em><td>Prevents execution
<tr><td><em>step</em><td>Executes one instruction then stops
<tr><td><em>interrupt</em><td>Send a software interrupt
<tr><td><em>signal</em><td>Send a software signal
<tr><td><em>wait</em><td>Wait for an event
<tr><td><em>notify</em><td>Notify that event has happened
<tr><td rowspan=3>Callout<td><em>read[i]</em><td>Read from quaject i
<tr><td><em>write[i]</em><td>Write to quaject i
<tr><td><em>call[i][e]</em><td>Call callentry e in quaject i
<tr><td rowspan=5>Clock<td rowspan=4>Callentry<td><em>gettime</em><td>Get the time of day, in "ticks"
<tr><td><em>getunits</em><td>Learn how many "ticks" there are in a second
<tr><td><em>alarm</em><td>Set an alarm: call given procedure at given time
<tr><td><em>cancel</em><td>Cancel an alarm
<tr><td rowspan=1>Callout<td><em>call[i]</em><td>Call procedure i upon alarm expiration
<tr><td rowspan=2>Probe<td rowspan=2>Callentry<td><em>probe</em><td>Tell which procedure to measure
<tr><td><em>show</em><td>Display statistics
<tr><td rowspan=2>Symtab<td rowspan=2>Callentry<td><em>lookup</em><td>Lookup a string; return its associated value
<tr><td><em>add</em><td>Add entry to symbol table
</table>
<table class=table>
<caption>
Table 4.4: Interface to Device Quajects
</caption>
<tr class=head><th>Quaject<th>Interface<th>Name<th>Purpose
<tr><td rowspan=3>Serial_in<td rowspan=2>Callentry<td><em>enable</em><td>Enable input
<tr><td><em>disable</em><td>Disable input
<tr><td rowspan=1>Callout<td><em>putchar</em><td>Write received characher
<tr><td rowspan=3>Serial_out<td rowspan=2>Callentry<td><em>enable</em><td>Enable output
<tr><td><em>disable</em><td>Disable output
<tr><td rowspan=1>Callout<td><em>getchar</em><td>Obtain characher to send
<tr><td rowspan=3>Sound_CD<td rowspan=2>Callentry<td><em>enable</em><td>Enable input
<tr><td><em>disable</em><td>Disable input
<tr><td rowspan=1>Callout<td><em>put_sample</em><td>Store sound sample received from CD player
<tr><td rowspan=3>Sound_DA<td rowspan=2>Callentry<td><em>enable</em><td>Enable output
<tr><td><em>disable</em><td>Disable output
<tr><td rowspan=1>Callout<td><em>get_sample</em><td>Get new sound sample to send to A/D device
<tr><td rowspan=4>Framebuffer<td rowspan=2>Callentry<td><em>blit</em><td>Copy memory bitmap to framebuffer
<tr><td><em>intr_ctl</em><td>Enable or disable interrupts
<tr><td rowspan=2>Callout<td><em>Vsync</em><td>Vertical sync interrupt
<tr><td><em>Hsync</em><td>Horizontal sync interrupt
<tr><td rowspan=5>Disk<td rowspan=4>Callentry<td><em>aread</em><td>Asyncronous read
<tr><td><em>awrite</em><td>Asynchronous write
<tr><td><em>format</em><td>Format the disk
<tr><td><em>blk_size</em><td>Learn the disk's block size
<tr><td rowspan=1>Callout<td><em>new disk</em><td>(Floppy) disk has been changed
</table>
<h3>4.1.2 Creating and Destroying Quajects</h3>
<p>Each class of quaject has create and destroy callentries that instantiate and destroy members of that class, including creating all their runtime-generated code. Creating a quaject involves allocating a single block of memory for its data and code, then initializing portions of that memory. With few exceptions, all of a quaject's runtime-generated code is created during this initialization. This generally involves copying the appropriate code template, determined by the type of quaject being created and the situation in which it is to be used, and then filling in the address fields in the instructions that reference quaject-specific data items. There are two exceptions to the rule. One is when the quaject implementation uses self-modifying code. The other occurs during the handling of callouts when linking one quaject to another. This is covered in the next section.
<p>Kernel quajects are created whenever they are needed to build higher-level services. For example, opening an I/O pipe creates a queue; opening a text window creates three quajects: a window, a VT-100 terminal emulator, and a TTY-cooker. Which quajects get created and how they are interconnected is determined by the implementation of each service.
<p>Quajects may also be created at the user level, simply by calling the class's create callentry. from a user-level thread. The effect is identical to creating kernel quajects, except that user memory is allocated and filled, and the resulting quajects execute in user-mode, not kernel. The kernel does not concern itself with what happens to such user-level quajects. It merely offers creation and linkage services to applications that want to use them.
<p>Quajects are destroyed when they are no longer needed. Invoking the destroy callentry signals that a particular thread no longer needs a quaject. The quaject itself is not actually destroyed until all references to it are severed. Reference counts are used. There is the possibility that circular references prevent destruction of otherwise useless quajects but this has not been a problem because quajects tend to be connected in cyclefree graphs. Destroying quajects does not immediately deallocate their memory. They are instead placed in the inactive list for their class. This speeds subsequent creation because much of the code-generation and initialization work had been already done.<sup>1</sup> As heap memory runs out, memory belonging to quajects on the inactive list is recycled.
<div class=footnote><sup>1</sup> Performance measurements in this dissertation were carried out without using the inactive list, but creating fresh quajects as needed.</div>
<h3>4.1.3 Resolving References</h3>
<p>The kernel resolves quaject callentry and callbacks references when linking quajects to build services. Conceptually, callouts and callback are function pointers that are initialized to point to other quaject's callentries when quajects are linked. For example, when attaching a queue to a source of data, the kernel fills the callouts of the data source with the addresses of the corresponding callentries in the queue and initializes the queue's callbacks with the addresses of the corresponding exception handlers in the data source. If the source of data is a thread, the address of the queue's <em>Qput</em> callentry is stored in the thread's write callout, the queue's <em>Qfull</em> callback is linked to the thread's suspend callentry, and the queue's <em>Qnotful</em> callback is linked to the thread's resume callentry. See Figure 4.2.
<p>In the actual implementation, a callout is a "hole" in the quaject's memory where linkage-specific runtime generated code is placed. Generally, this code consists of zero or more instructions that save any machine registers used by both caller and callee quajects, followed by a jsr instruction to invoke the target callentry, followed by zero or more instructions to restore the previously saved registers. The callout's code might also perform a context switch if the called quaject is in a different address space. Or, in the case when the code comprising the called quaject's callentry is in the same address space and is smaller than the space set aside for the callout, the callentry is copied in its entirety into the callout. This is how the layer-collapsing, in-line expansion optimization of Section 3.2.2 works. A flag bit in each callentry tells if it uses self-modifying code, in which case, the copy does not happen.
<p>Most linkage is done without referencing any symbol tables, but using information that is known at system generation time. Basically, the linking consists of blindly storing addresses in various places, being assured that they will always "land" in the correct place in the generated code. Similarly, no runtime type checking is required, as all such information has been resolved at system generation time.
<p>Not all references must be specified or filled. Each quaject provides default values for its callout and callbacks that define what happens when a particular callout or callback is needed but not connected. The action can be as simple as printing an error message and aborting the operation or as complicated as dynamically creating the missing quaject, linking the reference, and continuing.
<p>In addition, the kernel can also resolve references in response to execution traps that invoke the dynamic linker. Such references are represented by ASCII names. The name <em>Qget</em>, for example, refers to the queue's callentry. A symbol-table quaject maps the string names into the actual addresses and displacements. For example, the <em>Qget</em> callentry is represented in the symbol table as a displacement from the start of the queue quaject. Which quaject is being referenced is usually clear from context. For example, callentries are usually invoked using a register-plus-offset addressing mode; the register contains the address of the quaject in question. When not, an additional parameter disambiguates the reference.
<h3>4.1.4 Building Services</h3>
<p>Higher-level kernel services are built by composing several basic quajects. I now show, by means of an example, how a data channel is put together. The example illustrates the usage of queues and reference resolution. It also shows how a data channel can support two kinds of interfaces, blocking and non-blocking, using the same quaject building block. The queue quaject used is of type ByteQueue.<sup>2</sup>
<div class=footnote><sup>2</sup> The actual implementation of Synthesis V.1 uses an optimized version of ByteQueue that has a string-oriented interface to reduce looping, but the semantics is the same.</div>
Figure 4.2 shows a producer thread using the <em>Qput</em> callentry to store bytes in the queue. The ByteQueue's <em>Qfull</em> callback is linked to the thread's suspend callentry; the ByteQueue's <em>Qnotful</em> callback is linked to the thread's resume callentry. As long as the queue is not full, calls to <em>Qput</em> enqueue the data and return normally. When the queue becomes full, the queue invokes the <em>Qfull</em> callback, suspending the producer thread. When the ByteQueue's reader removes a byte, the <em>Qnotful</em> callback is invoked, awakening the producer thread. This implements the familiar synchronous interface to an I/O stream.
<table class=fig>
<caption>
Figure 4.2: Blocking write
</caption>
<tr><td>Kind of Reference<td>User Thread<td><td colspan=2>ByteQueue<td>Device Driver<td>Hardware
<tr><td>callentry<td>write<td>&#8658;<td><em>Qput</em><td><em>Qget</em><td>&#8656;<td>send-complete interrupt
<tr><td>callback<td>suspend<td>&#8656;<td><em>Qfull</em><td><em>Qempty</em><td>&#8658;<td>turn off send-complete
<tr><td>callback<td>resume<td>&#8656;<td><em>Qnotful</em><td><em>Qnotempty</em><td>&#8658;<td>turn on send-complete
</table>
<table class=fig>
<caption>
Figure 4.3: Non-blocking write
</caption>
<tr><td>Reference<td>Thread<td><td>ByteQueue
<tr><td>callentry<td>write<td>&#8658;<td><em>Qput</em>
<tr><td>callback<td>return to caller<td>&#8656;<td><em>Qfull</em>
<tr><td>callback<td>if(more work) goto Qput<td>&#8656;<td><em>Qnotful</em>
</table>
<p>Contrast this with Figure 4.3, which shows a non-blocking interface to the same data channel implemented using the same queue quaject. Only the connections between ByteQueue and the thread change. The thread's write callout still connects to the queue's <em>Qput</em> callentry. But the queue's callbacks no longer invoke procedures that suspend or resume the producer thread. Instead, they return control back to the producer thread, functioning, in effect, like interrupts that signal events -- in this example, the filling and emptying of the queue. When the queue fills, the <em>Qfull</em> callback returns control back to the producer thread, freeing it to do other things without waiting for output to drain and without having written the bytes that did not fit. The thread knows the write is incomplete because control flow returns through the callback, not through <em>Qput</em>. After output drains, <em>Qnotful</em> is called, invoking an exception handler in the producer thread which checks whether there are remaining bytes to write, and if so, it goes back to <em>Qput</em> to finish the job.
<p>Ritchie's Stream I/O system has a similar flavor: it too provides a framework for attaching stages of processing to an I/O stream [27]. But stream-I/O's queueing structure is fixed, the implementation is based on messages, and the I/O is synchronous. Unlike StreamI/O, quajects offer a finer level of control and expanded possibilities for connection. The previous example illustrates this by showing how the same queue quaject can be connected in different ways to provide either synchronous or asynchronous I/O. Furthermore, quajects extend the idea to include non-I/O services as well, such as threads.
<h3>4.1.5 Summary</h3>
<p>In the implementation of Synthesis kernel, quajects provide encapsulation and make all inter-module dependencies explicit. Although quajects differ from objects in traditional O-O systems because of a procedural interface and run-time code generation implementation, the benefits of encapsulation and abstraction are preserved in a highly efficient implementation.
<p>I have shown, using the data channel as an example, how quajects are composed to provide important services in the Synthesis kernel. That example also illustrates the main points of a quaject interface:
<ul>
<li>Callentry references implement object-oriented-like methods and bypass interpretation in the invoked quaject.
<li>Callback references implement return codes and bypass interpretation in the invoker.
<li>The operation semantics are determined dynamically by the quaject interconnections, independent of the quaject's implementation.
</ul>
<p>This last point is fundamental in allowing a true orthogonal quaject implementation, for example, enabling a queue to be implemented without needing any knowledge of how threads work - not even how to suspend and resume them.
<p>The next section shows how the quaject ideas fit together to provide user-level services.
<h2>4.2 Procedure-Based Kernel</h2>
<p>Two fundamental ideas underlie how Synthesis is structured and how the services are invoked:
<ul>
<li>Every callentry is a real, distinct procedure.
<li>Services are invoked by calling these procedures.
</ul>
<p>Quaject callentries are small procedures stored at known, fixed offsets from the base of the block of memory that holds the quaject's state. For simple callentries, the entire procedure is stored in the allocated space of the structure. Quajects such as buffers and queues have their callentries expanded in this manner, using all the runtime code-generation ideas discussed in Chapter 3. For more complex callentries, the procedures usually consist of some instance-specific code to handle the common execution paths, followed by code that loads the base pointer of the quaject's structure into a machine register and jumps to shared code implementing the rest of the callentry.
<p>This representation differs from that of methods in object-oriented languages such as C++. In these languages, the object's structure contain pointers to generic methods for that class of object, not the methods themselves. The language system passes a pointer to the object's structure as an extra parameter to the procedure implementing each method. This makes it hard to use an object's method as a real function, one whose address can be passed to other functions without also passing and dealing with the extra parameter.
<p>It is this difference that forms the basis of Synthesis quaject composition and extensible kernel service. Every callentry is a real procedure, each with a unique address and expecting no "extraneous" parameters. Each queue's <em>Qput</em>, for example, takes exactly one parameter: the data to be enqueued. This property is fundamental for easy quaject composition: each quaject in a chain simply calls the next, without passing an arbitrarily long array of structure pointers downstream, one for each quaject.
<h3>4.2.1 Calling Kernel Procedures</h3>
<p>The discussion until now assumes that the callentries reside in the same address space and execute at the same privilege level as their caller, so that direct procedure call is possible. But when user-level programs invoke kernel quajects, e.g., to read a file, the invocation crosses a protection boundary. A direct procedure call would not work because the kernel routine needs to run in supervisor mode.
<p>In a conventional operating system, such as <span class=smallcaps>Unix</span>, application programs invoke the kernel by making system calls. But while system calls provide a controlled, protected way for a user-level program to invoke procedures in the kernel, they are limited in that they allow access to only a fixed set of procedures in the kernel. For Synthesis to be extensible, it needs an extensible kernel call mechanism; a mechanism that supports a protected, userlevel interface to arbitrary kernel quajects.
<p>The user-level interface is supplied with stub quajects. Stub quajects reside in the user address space and have the same callentries, with the same offsets, as the kernel quaject which they represent. Invoking a stub's callentry from user-level results in the corresponding kernel quaject's callentry being invoked and the results returned back.
<p>This is implemented in the following way. The stub's callentries consist of tiny procedures that load a number into a machine register and then executes a trap instruction. The number identifies the desired kernel procedure. The trap switches the processor into kernel mode, where it executes the kernel-procedure dispatcher. The dispatcher uses the procedure number parameter to index a thread-specific table of kernel procedure addresses. Simple limit checks ensure the index is in range and that only the allowed procedures are called. If the checks pass, the dispatcher invokes the kernel procedure on the behalf of the user-level application.
<p>There are many benefits to this design. One is that it extends the kernel quaject interface transparently to user-level, allowing kernel quajects to be composed with user-level quajects. Its callentries are real procedures: their addresses can be passed to other functions or stored in tables; they can be in-line substituted into other procedures and optimized using the code-synthesis techniques of Section 3.2 applied at the user level. Another advantage, which has already been discussed in Section 3.3.4, is that a very efficient implementation exists. The result is that the protection boundary becomes fluid; what is placed in the kernel and what is done at user-level can be chosen at will, not dictated by the design of the system. In short, all the advantages of kernel quajects have been extended out to user level.
<h3>4.2.2 Protection</h3>
<p>Kernel procedure calls are protected because the user program can only specify indices into the kernel procedure table (KPT), so the kernel quajects are guaranteed to execute only from legitimate entry points, and because the index is checked before being used, only valid entries in the table can be accessed.
<h3>4.2.3 Dynamic Linking</h3>
<p>Synthesis supports two flavors of dynamic linking: load-link, which resolves external references at program load time, before execution begins; and run-link, which resolves references at runtime as they are needed. Run-link has the advantage of allowing execution of programs with undefined references as long as the execution path does not cross them, simplifying debugging and testing of unfinished programs.
<p>Dynamic linking does not prevent sharing or paging of executable code. It is possible to share dynamically-linked code because the runtime libraries always map to the same address in all address spaces. It is possible to page run-linked code and throw away infrequently used pages instead of writing them to backing store because the dynamic linker will re-link the references should the old page be needed again.
<h2>4.3 Threads of Execution</h2>
<p>Synthesis threads are light-weight processes, implemented by the thread quaject. Each Synthesis thread (called simply "thread" from now on) executes in a context, defined by the thread table entry (TTE), which is the data part of the thread quaject holding the thread state and which contains:
<ul>
<li>The register save area to hold the thread's machine registers when the thread is not executing.
<li>The kernel procedure table (KPT) - that table of callouts described in 4.2.1. ffl The signal table, used to dispatch software signals. ffl The address mapping tables for virtual memory.
<li>The vector table - the hardware-defined array of starting addresses of exception handlers. The hardware consults this table to dispatch the hardware-detected exceptions: hardware interrupts, error traps (like division by zero), memory faults, and software-traps (system calls).
<li>The context-switch-in and context-switch-out procedures comprising the executable data structure of the ready queue.
</ul>
<p>Of these, the last two are unusual. The context-switch-in and -out procedures were already discussed in Section 3.3.2, which explains how executable data structures are used to implement fast context switching. Giving each thread its own vector table also differs from usual practice, which makes the vector table a global structure, shared by all threads or processes. By having a separate vector table per thread, Synthesis saves the dispatching cost of thread-specific exceptions. Since most of the exceptions are thread specific, the savings is significant. Examples include all the error traps, such as division by zero, and the VM-related traps, such as translation fault.
<h3>4.3.1 Execution Modes</h3>
<p>Threads can execute in one of two modes: supervisor mode and user mode. When a thread calls the kernel by issuing the trap instruction, it changes modes from user to supervisor. This view of things is in contrast to having a kernel server process run the kernel call on the behalf of the client thread. Each thread's memory mapping tables are set so that as the thread switches to supervisor mode, the kernel memory space becomes accessible in addition to the user space, in effect, "unioning" the kernel memory space with the user memory space. (This implies the set of addresses used must be disjoint.) Consequently, the kernel call may move data between the user memory and the kernel memory easily, without using special machine instructions, such as "moves" (move from/to alternate address space), that take longer to execute. Other memory spaces are outside the kernel space, inaccessible even from supervisor mode except through special instructions. Since no quaject's code contains those special instructions, Synthesis can easily enforce memory access restrictions for its kernel calls by using the normal user-level memory-access checks provided by the memory management unit. It first checks that no pointer is in the kernel portion of the address space (an easy check), and then proceeds to move the data. If an illegal access happens, or if a non-resident page is referenced, the thread will take a translation-fault exception, even from supervisor mode; the fault handler then reads in the referenced page from backing store if it was missing or prints the diagnostic message if the access is disallowed. (All this works because all quajects are reentrant, and since system calls are built out of quajects, all system calls are reentrant.)
<p>Synthesis threads also provide a mechanism where routines executing in supervisor mode can make protected calls to user-mode procedures. It is mostly used to allow usermode handling of exceptions that arise during supervisor execution, for example, someone typing "Control-C" while the thread is in the middle of a kernel call. It is also expected to find use in a future implementation of remote procedure call. The hard part in allowing user-level procedure calls is not in making the call, but arranging for a protected return from user-mode back to supervisor. This is done by pushing a special, exception-causing return address on the user stack. When the user procedure finishes and returns, the exception is raised, putting the thread back into supervisor mode.
<h3>4.3.2 Thread Operations</h3>
<p>As a quaject, the thread supports several operations, defined by its callentries. They are: <em>suspend</em>, <em>resume</em>, <em>stop</em>, <em>step</em>, <em>interrupt</em>, <em>signal</em>, <em>setsignal</em>, <em>wait</em>, and <em>notify</em>. The last four overlap functionality with the first five, but are included for programmer convenience.<sup>3</sup>
<div class=footnote><sup>3</sup> In the current implementation, the thread quaject is really a composition of two lower-level quajects, neither of them externally visible: a basic thread quaject which supports the five fundamental operations listed; and a hi thread quaject, which adds the higher-level operations. I'm debating whether I want to make the basic thread quaject visible.</div>
<p><em>Suspend</em> and <em>resume</em> control thread execution, disabling or re-enabling it. They are often the targets of I/O quajects' callbacks, implementing blocking I/O. <em>Stop</em> and <em>step</em> support debuggers: <em>stop</em> prevents thread execution; <em>step</em> causes a stopped thread to execute a single machine instruction and then re-enter the stopped state. The difference between <em>stop</em> and <em>suspend</em> is that a suspended thread still executes in response to interrupts and signals while a stopped one does not. Resume continues thread execution from either the stopped or suspended state.
<p><em>Interrupt</em> causes a thread to call a specified procedure, as if a hardware interrupt had happened. It takes two parameters, an address and a mode, and it causes the thread to call the procedure at the specified address in either user or supervisor mode according to the mode parameter. Suspended threads can be interrupted: they will execute the interrupt procedure and then re-enter the suspended state.
<p><em>Signal</em> is like interrupt, but with a level of indirection for protection and isolation. It takes an integer parameter, the signal number, and indexes the thread's signal-table with it, obtaining the address and mode parameters that are then passed to interrupt. <em>Setsignal</em> associates signal numbers with addresses of interrupt procedures and execution modes. It takes three parameters: the signal number, an address, and a mode; and it fills the table slot corresponding to the signal number with the address and mode.
<p><em>Wait</em> waits for events to happen. It takes one parameter, an integer representing an event, and it suspends the thread until that event occurs. <em>Notify</em> informs the thread of the occurrence of events. It too takes one parameter, an integer representing an event, and it resumes the thread if it had been waiting for this event. The thread system does not concern itself with what is an event nor how the assignment of events to integers is made.
<h3>4.3.3 Scheduling</h3>
<p>The Synthesis scheduling policy is round-robin with an adaptively adjusted CPU quantum per thread. Instead of priorities, Synthesis uses fine-grain scheduling, which assigns larger or smaller quanta to threads based on a "need to execute" criterion. A detailed explanation on fine-grain scheduling is postponed to Chapter 6. Here, I give only a brief informal summary.
<p>A thread's "need to execute" is determined by the rate at which I/O data flows through its I/O channels compared to the rate at which which the running thread produces or consumes this I/O. Since CPU time consumed by the thread is an increasing function of the data flow, the faster the I/O rate the faster a thread needs to run. Therefore, the scheduling algorithm assigns a larger CPU quantum to the thread. This kind of scheduling must have a fine granularity since the CPU requirements for a given I/O rate and the I/O rate itself may change quickly, requiring the scheduling policy to adapt to the changes.
<p>Effective CPU time received by a thread is determined by the quantum assigned to that thread divided by the sum of quanta assigned to all threads. Priorities can be simulated and preferential treatment can be given to certain threads in two ways: raise a thread's CPU quantum and reorder the ready queue as threads block and unblock. As an event unblocks a thread, its TTE is placed at the front of the ready queue, giving it immediate access to the CPU. This minimizes response time to events. Synthesis' low-overhead context switch allows quanta to be considerably shorter than that of other operating systems without incurring excessive overhead. Nevertheless, to minimize time spent context switching, CPU quanta are adjusted to be as large as possible while maintaining the fine granularity. A typical quantum is on the order of a few hundred microseconds.
<h2>4.4 Input and Output</h2>
<p>In Synthesis, I/O includes all data flow among hardware devices and address spaces. Data move along logical channels called data channels, which connect sources of data with the destinations.
<h3>4.4.1 Producer/Consumer</h3>
<p>The Synthesis implementation of the channel model I/O follows the well-known producer/consumer paradigm. Each data channel has a control flow that directs its data flow. Depending on the origin and scheduling of the control flow, a producer or consumer can be either active or passive. An active producer (or consumer) runs on a thread and calls functions submitting (or requesting) its output (or input). A thread performing writes is active. A passive producer (or consumer) does not run of its own; it sits passively, waiting for one of its I/O functions to be called, then using the thread that called the function to initiate the I/O. A TTY window is passive; characters appear on the window only in response to other thread's I/O. There are three cases of producer/consumer relationships, which we shall consider in turn.
<p>The simplest is an active producer and a passive consumer, or vice-versa. This case, called active-passive, has a simple implementation. When there is only one producer and one consumer, a procedure call does the job. If there are multiple producers, we serialize their access. If there are multiple consumers, each consumer is called in turn.
<p>The most common producer/consumer relationship has both an active producer and an active consumer. This case, called active-active, requires a queue to mediate the two. For a single producer and a single consumer, an ordinary queue suffices. For cases with multiple participants on either the producer or consumer side, we use one of the optimistically-synchronized concurrent-access queues described in section 5.2.2. Each queue may be synchronous (blocking) or asynchronous (using signals) depending on the situation.
<p>The last case is a passive producer and a passive consumer. Here, we use a pump quaject that reads data from the producer and writes it to the consumer. This works for multiple passive producers and consumers as well.
<h3>4.4.2 Hardware Devices</h3>
<p>Physical I/O devices are encapsulated in quajects called device servers. The device server interface generally mirrors the basic, "raw" interface of the physical device. Its I/O operations typically include asynchronous read and write of fixed-length data records and device-specific query and control functions. Each device server may have its own thread(s) or not. A polling I/O server runs continuously on its own thread. An interrupt-driven server blocks after initialization. The server without threads runs when its physical device generates an interrupt, invoking one of its callentries. Device servers are created at boot time, one server for each device, and persist until the system is shut down. Device servers can also be added as the system runs, but this must be done from a kernel thread -- currently there is no protected, user-level way to do this.
<p>Higher-level I/O streams are created by composing a device server with one or more filter quajects. There are three important functions that a filter quaject can perform: mapping one style of interface to another (e.g., asynchronous to synchronous), mapping one data format to another (e.g., EBCDIC to ASCII, byte-reversal), and editing data (e.g., backspacing). For example, the Synthesis equivalent of <span class=smallcaps>Unix</span> cooked tty interface is a filter that processes the output from the raw tty device server, buffers it, and performs editing as called for by the erase and kill control characters.
<h2>4.5 Virtual Memory</h2>
<p>A full discussion of virtual memory will not be presented in this dissertation because all the details have not been completely worked out as of the time of this writing. Here, I merely assert that Synthesis does support virtual memory, but the model and interface are still in flux.
<h2>4.6 Summary</h2>
<p>The positive experience in using quajects shows that a highly efficient implementation of an object-based system can be achieved. The main ingredients of such an implementation are:
<ul>
<li>a procedural interface using callout and callentry references,
<li>explicit callback references for asynchronous return,
<li>run-time code generation and linking.
</ul>
Chapter 7 backs this up with measurements. But now, we will look at issues involving multiprocessors.
</div>
</body>
</html>