reshaped to accommodate language (some) approach

This commit is contained in:
Torsten Ruger 2015-11-23 14:19:56 +02:00
parent af092c1209
commit d4bb141084
3 changed files with 322 additions and 152 deletions

View File

@ -3,10 +3,11 @@ layout: site
---
<div class="row vspace10">
<div class="span12">
<h2 class="center">A completely object oriented virtual machine</h2>
<h2 class="center">A completely object oriented machine</h2>
<div>
<p class="center"><span>
Leaving the old (c) world behind to go where no machine has gone before (or something like that)
A fully self describing object system without external dependencies capable of executing dynamic
object oriented languages like ruby or python.
</span></p>
</div>
</div>
@ -14,61 +15,62 @@ layout: site
<div class="row vspace20">
<div class="span4">
<h2 class="center">Architecture</h2>
<p>
Salama is maybe the first successful attempt at writing a virtual machine without the use
of c or c tools.
It defines and implements an object virtual machine completely in object oriented terms,
using ruby to bootstrap itself.
<h2 class="center">Goal</h2>
<p>
The goal is to execute object oriented code without external dependencies, on modern hardware.
</p>
<p>
Just some of the features, most of which would not be possible in c:
<ul>
<li> Linked-List, not stack, based </li>
<li> Multiple return addresses based on type </li>
<li> Multiple implementations per function based on type </li>
<li> Implicit type tracking using adaptive code</li>
<li> Explicit <a href="/2015/06/20/the-static-call-chain.html">message and frame objects</a></li>
<li> <a href="http://book.salama-vm.org/register/machine.html">Register machine abstraction</a></li>
<li> <a href="http://book.salama-vm.org/object/instructions.html">Extensible</a> instruction set</li>
</ul>
Salama defines is's own machine language (soml) to bridge the gap between the higher language
(ruby) and assembler. Both soml and assembler can be seens as layers towards the final
binary executables</li>
<p>
No external dependencies means a system that defines an object oriented system language
that compiles to assembler. A sort of object version of c, but without using c.
</p>
<p>
It must be possible to compile higher level, dynamic, object oriented languages into this
language, in a similar way that c++ is compiled into c (at least used to be). So ruby compiles
to soml which compiles to assembler which compiles to binaries. <b>No interpretation.</b>
</p>
<p>
Most of the system is defined in a higher level language (ruby) and only a small runtime,
mostly for operating system acccess, needs to be written in the system language.
</p>
</div>
<div class="span4">
<h2 class="center">Status</h2>
<p>
A first version of the system language is now <a href="/soml/soml.html">done.</a>.
The staticaly typed language is called SOML (salama object machine language), has a roughly
ruby-ish syntax while c-ish semantics, and introduces several new concept:
<ul>
<li> Object based memory (no global memory) </li>
<li> Multiple return addresses based on type </li>
<li> Multiple implementations per function based on type </li>
<li> Implicit type tracking using adaptive code</li>
<li> Explicit <a href="/2015/06/20/the-static-call-chain.html">message and frame objects</a></li>
<li> <a href="http://book.salama-vm.org/register/machine.html">Register machine abstraction</a></li>
<li> <a href="http://book.salama-vm.org/object/instructions.html">Extensible</a> instruction set</li>
</ul>
</p>
<p>
While the project is just getting on two years, it is starting to settle conceptually,
progress smoothly, and produce <b>working binaries</b>.
An abstract risc like register level defines some abstraction from the actual hardware. The
compiler compiles to this level, but a mapping to Arm is provided to produce <b>working binaries</b>.
</p>
<p>
In numbers, there are over <b>1000 commits</b>, 6 sub-projects, more than 10k lines of code
and well over 600 tests.
</p>
<p>
Maybe more importantly there is <a href"/book.html">good documentation</a> along with an
evolved idea of how most of the difficult issues are solved. So while the executables are
still of the "Hello world" quality, there are no coneptual problems anymore.
</p>
<p>
<p>
There is also an interpreter (mostly for testing) and a basic
<a href="https://github.com/salama/salama-debugger"> visual debugger</a> which not only helps
debugging, but also understanding of the machine.
</p>
</div>
<div class="span4">
<h2 class="center">Docs</h2>
<p>
The short introduction is under the <a href="/salama/layers.html">architecture</a> menu.
</p>
<p>
The section on SOML gives an overview of the <a href="/soml/soml.html">system language</a>.
</p>
<p>
The full documentation is in form of a gitbook and can be <a href="/book.html">viewed</a> ,
and <a href="https://github.com/salama/object-machine">edited</a>
The full documentation is in form of a gitbook and can be <a href="/book.html">viewed here.</a>
</p>
<p>
The <a href="/project/motivation.html">about</a> section has some info of when and how this

View File

@ -1,148 +1,162 @@
---
layout: salama
title: Salama, a simple and minimal oo machine
title: Salama architectural layers
---
<div class="row vspace10">
<div class="span12 center">
<h3><span>Salama layers</span></h3>
<p>Just a small primer (left over from the start), really <a href="http://dancinglightning.gitbooks.io/the-object-machine/content/">the book</a> is the best starting point</p>
</div>
</div>
<div class="row vspace20">
<div class="span11">
<h5>Machine Code, bare metal</h5>
<p>
This is the easy to understand part, that's why it's first, it's the code
from <a href="https://github.com/salama/salama-arm"> salama-arm </a>, which is quite stable.
It creates binary code according to the arm specs. All about shifting bits in the
right way.
<br/>
As an abstraction it is not far away from assembler. I mapped the memnonics to function calls and the registers
can be symbols or Values (from vm). But on the whole this is as low level as it gets.
<br/>
Different types of instructions are implemented by different classes. To make machine dependant code possible,
those classes are derived from Vm versions.
<br/>
There is an intel directory which contains an expanded version of wilson, but it has yet to be made to fit into
the architecture. So for now salama produces arm code.
<br/>
There is an elf directory wich builds actual executables, a mini implementation of the elf standard.
</p>
</div>
</div>
<div class="row">
<div class="span12">
<h5>Parsing, forever descending</h5>
<div class="span10">
<h4>Main Layers</h4>
<p>
Parsing is relatively straightforward too, it's code is found in <a href="https://github.com/salama/salama-reader">
it's own repository </a>.
It parses more than can be processed, but much less than ruby is.
<br/>
We all know ruby, so it's just a matter of getting the rules right.
If only! Ruby is full of niceties that actually make parsing it quite difficult. But at the moment that story
hasn't even started.
<br/>
Traditionally, yacc or bison or talk of lr or ll would come in here and all but a few would zone out. But llvm has
proven that recursive descent parsing is a viable alternative, also for big projects. And Parslet puts that into
a nice ruby framework for us.
<br/>
Parslet lets us use modules for parts of the parser, so those files are pretty self-explanitory.
Not all is done, but a good start.
<br/>
Parslet also has a seperate Transformation pass, and that creates the AST. Those class names are also
easy, so you can guess what an IfExpression represents.
<br/>
To implement an object system to execute object oriented languages takes a large system.
The parts or abstraction layers are detailed below.</br>
It is important to undrstand the approach first though, as it differs from the normal
interpretation. The idea is to compile (eg) ruby. It may be easiest to compare to a static
object oriented language like c++. When c++ was created c++ code was translated into c, which
then gets translated into assembler, which gets translated to binary code, which is linked
and executed. Compiling to binaries is what gives these languages speed, and is one reason
to compile ruby. </br>
In a similar way to the c++ example, we need language between ruby and assembler, as it is too
big a mental step from ruby to assembler. Off course course one could try to compile to c, but
since c is not object oriented that would mean dealing with all off c's non oo heritance, like
linking model, memory model, calling convention etc. (more on this in the book) <br/>
The layers are:
<ul>
<li> <b> Binary and cpu specific assembler.</b> This includes arm assembly and elf support
to produce a binary that can then read in ruby programs</li>
<li> <b> Risc register machine abstraction </b> provides a level of machine abstraction, but
as the name says, quite a simle one.</li>
<li> <b> Soml, Salama object machine language, </b> which is like our object c. Statically
typed object oriented with object oriented call sematics. </li>
<li> <b> Salama </b> , which is the layer compiling ruby code into soml and includes
bootstraping code</li>
</ul>
</p>
</div>
</div>
<div class="row">
<div class="span12">
<h5>Virtual Machine</h5>
<div class="span10">
<h5>Binary , Arm and Elf</h5>
<p>
The Virtual machine layer is where it gets interesting, but also a little fuzzy.
<br/>
After some trying around the virtual machine layer has become a completely self contained layer to describe and
implement an oo machine. In other words it has no reference to any physical machine, that is the next layer down.
<br/>
One can get headaches quite easily while thinking about implementing an oo machine in oo, it's just so difficult to
find the boundaries. To determine those, i like to talk of types (not classes) for the objects (values) in which the
vm is implemented. Also it is neccessary to remove ambiguity about what message sending means.
<br/>
One way to think of this (helps to keep sane) is to think of the types of the system known at compile time. In the
simplest case this could be object reference and integer. The whole vm functionality can be made to work with only
those two types, and it is not specified how the type information is stored. but off course there needs to be a
way to check it at run-time.
<br/>
The vm has an instruction set that, apart from basic integer manipulation, only alows for memory access into an
object. Instead of an implicit stack, we use activation frames and store all variables explicitly.
A physical machine will run binaries containing intructions that the cpu understands. With arm
being our main target, this means we need code to produce binary, which is contained in a
seperate module <a href="https://github.com/salama/salama-arm"> salama-arm </a>. <br/>
To be able to run code on a unix based operating system, binaries need to be packaged in a
way that the os understands, so minimal elf support is included in the package. <br/>
Arm is a risc architecture, but anyone who knows it will attest, with it's own quirks.
For example any instruction may be executed conditionally in arm. Or there is no 32bit
register load instruction. It is possible to create very dense code using all the arm
special features, but this is not implemented yet.
</p>
</p>
</div>
</div>
<div class="row">
<div class="span12">
<h5>Compilation in passes</h5>
<p>
Compilation happens in Passes. A single pass is a small piece of code to do just a very small part of the
whole compilation.
<br/>
Logically there are four distinct steps. From the parsed AST we compile a datastructure that includes instructions
for an object machine. The next step is a (still abstract) register machine, before the actual binary for the
arm is generated.
</p>
</p>
</div>
</div>
<div class="row">
<div class="span12">
<div class="span10">
<h5>Register Machine</h5>
<p>
The Register machine layer is a relatively close abstraction of hardware.
The Register machine layer is a relatively close abstraction of risc hardware, but without the
quirks.
<br/>
The step from OO machine to Arm had proved to large, also partially due to the cryptic arm names.
The register machine has registers, indexed addressing, operators, branches and everything
needed for the next layer. It doesn not try to abstract every possible machine leature
(like llvm), but rather "objectifies" the risc view to provide what is needed for soml, the
next layer up.
<br/>
The register machine has registers, indexed addressing, a pc and all the sort of normal things one would expect.
The machine has it's own (abstract) instruction set, which serves mainly to give understandable names.
<br/>
The mapping to arm is quite straightforward.
The machine has it's own (abstract) instruction set, and the mapping to arm is quite
straightforward. Since the instruction set is implemented as derived classes, additional
instructions may be defined and used later, as long as translation is provided for them too.
In other words the instruction set is extensible (unlike cpu instruction sets).
</p>
<p>
Basic object oriented concepts are needed already at this level, to be able to generate a whole
self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called
parfait and will be coded in soml eventually. But since it is the same objects at runtime and
compile time, it will then be translated back to ruby for use at compile time. Currenty there
are two versions of the code, in ruby and soml, being hand synchronized. More about parfait below.
</p>
<p>
Since working with at this low machine level (essentially assembler) is not easy to follow for
everyone, an interpreter was created. Later a graphical interface, a kind of
<a href="https://github.com/salama/salama-debugger"> visual debugger </a> was added.
Visualizing the control flow and being able to see values updated immediately helped
tremendously in creating this layer. And the interpreter helps in testing, ie keeping it
working in the face of developer change.
</p>
</div>
</div>
<div class="row">
<div class="span10">
<h5>Soml, Salama object machine language</h5>
<p>
Soml is probably the larest single part of the system and much more information can be found
<a href="/soml/soml.html"> here </a>.
<br/>
Before soml, a more traditional virtual machine approach was taken and abandoned. The language
is easy to understand and provides a good abstraction, both in terms of object orienteation,
and in terms of how this is expressed in the register model. <br/>
It is like ruby with out the dynamic aspects, but typed. <br/>
In broad strokes it consists off:
<ul>
<li> <b> Parser:</b> Currently a peg parser, though a hand coded one is planned.
The result of which is an AST</li>
<li> <b> Compiler:</b> compiles the ast into a sequence of Register instructions.
and runtime objects (classes, methods etc)</li>
<li> <b> Parfait: </b> Is the runtime, ie the minimal set of objects needed to
create a binary with the required information to be dynamic</li>
<li> <b> Builtin: </b> A very small set of primitives that are impossible to express
in soml (remembering that parfait will be expressed in soml eventually)</li>
</ul>
</p>
<p>
Just to summarize a few of soml features that are maybe unusual:
<ul>
<li> <b> Mesage based calling:</b> Calling is completely object oriented (not stack based)
and uses Message and Frame objects.</li>
<li> <b> Return addresses:</b> A soml method call may return to several addresses, according
to type, and in case of exception</li>
<li> <b> Overloaded arguments </b> A method is defined by name, but may have several
implementations for different types of the arguments (statically matched)</li>
</ul>
</p>
</div>
</div>
<div class="row">
<div class="span12">
<h5>Parfait</h5>
<div class="span10">
<h5>Salama</h5>
<p>
Ruby is very dynamic, and so it has a relatively large run-time. Parfait is that Run-time.
To compile and run ruby, we need to parse and compile ruby code. To compile ruby to soml a clear
mapping has to be achieved. Particularly the dynamic aspects, and typing need to be addressed.
<br/>
Parfait includes all the functionality a ruby program could not do without, Array, Hash, Object, Class, etc.
<br/>
Parfait does not include any stdlib or indeed core functionality if it doesn't have too.
<br/>
Parfait is coded in ruby, but not all functionality can be coded in ruby, so there is Builtin
While parsing ruby is quite a difficult task, it has already been implemented in pure ruby
<a href="https://github.com/whitequark/parser"> here </a>. The output of the parser is again
an ast, which needs to be compiled to soml. <br/>
The dynamic aspects of ruby are actually realtively easy to handle, once the whole system is
in place, because the whole system is written in ruby without external dependencies.
Since (when finished) it can compile ruby, it can do so to produce a binary. This binary can
then contain the whole of the system, and so the resulting binary will be able to produce
binary code when it runs. With small changes to the linking process (easy in ruby!) it can
then extend itself.
</p>
<p>
The type aspect is more tricky: Ruby is not typed and soml is after all. And if everything
were objects (as we like to pretend in ruby) we could just do a lot of dynamic checking,
possibly later introduce some caching. But everything is not an object, minimally integers
are not, but maybe also floats and other values. The destinction between what is an integer
and what an object has sprouted an elaborate type system, which is (by necessity) present in
soml (see there).
</p>
<p>
The idea (because it hasn't been implemented yet) is to have different functions for different
types. The soml layer defines object layout and types and also lets us return to different
places from a function (in effect a soml function call is like an if). By using this, we can
compile a single ruby method into several soml methods. Each such method is typed, ie all
arguments and variables are of known type. According to these types we can call methods according
to their signatures. Also we can autognerate error methods for unhandled types, and predict
that only a fraction of the possible combinations will actually be needed.
</p>
</div>
</div>
<div class="row">
<div class="span12">
<h5>Builtin</h5>
<p>
Builtin is the part of the vm that can not be coded in ruby. It is not, as may be imagined, a set of instructions,
but rather a set of modules.
<br/>
Modules of Builtin have functions that implement functionality that can not be coded in ruby. Ie array access.
The functions take a VM::Method and provide the code as a set of instructions. This may be seen as the assembler
layer if the vm.
</p>
</div>
</div>

154
soml/layers.html Normal file
View File

@ -0,0 +1,154 @@
---
layout: salama
title: Salama architectural layers
---
<div class="row">
<div class="span10">
<h4>Main Layers</h4>
<p>
To implement an object system to execute object oriented languages takes a large system.
The parts or abstraction layers are detailed below.</br>
It is important to undrstand the approach first though, as it differs from the normal
interpretation. The idea is to compile (eg) ruby. It may be easiest to compare to a static
object oriented language like c++. When c++ was created c++ code was translated into c, which
then gets translated into assembler, which gets translated to binary code, which is linked
and executed. Compiling to binaries is what gives these languages speed, and is one reason
to compile ruby. </br>
In a similar way to the c++ example, we need language between ruby and assembler, as it is too
big a mental step from ruby to assembler. Off course course one could try to compile to c, but
since c is not object oriented that would mean dealing with all off c's non oo heritance, like
linking model, memory model, calling convention etc. (more on this in the book) <br/>
The layers are:
<ul>
<li> <b> Binary and cpu specific assembler.</b> This includes arm assembly and elf support
to produce a binary that can then read in ruby programs</li>
<li> <b> Risc register machine abstraction </b> provides a level of machine abstraction, but
as the name says, quite a simle one.</li>
<li> <b> Soml, Salama object machine language, </b> which is like our object c. Statically
typed object oriented with object oriented call sematics. </li>
<li> <b> Salama </b> , which is the layer compiling ruby code into soml and includes
bootstraping code</li>
</ul>
</p>
</div>
</div>
<div class="row">
<div class="span10">
<h5>Binary , Arm and Elf</h5>
<p>
A physical machine will run binaries containing intructions that the cpu understands. With arm
being our main target, this means we need code to produce binary, which is contained in a
seperate module <a href="https://github.com/salama/salama-arm"> salama-arm </a>. <br/>
To be able to run code on a unix based operating system, binaries need to be packaged in a
way that the os understands, so minimal elf support is included in the package. <br/>
Arm is a risc architecture, but anyone who knows it will attest, with it's own quirks.
For example any instruction may be executed conditionally in arm. Or there is no 32bit
register load instruction. It is possible to create very dense code using all the arm
special features, but this is not implemented yet.
</p>
</div>
</div>
<div class="row">
<div class="span10">
<h5>Register Machine</h5>
<p>
The Register machine layer is a relatively close abstraction of risc hardware, but without the
quirks.
<br/>
The register machine has registers, indexed addressing, operators, branches and everything
needed for the next layer. It doesn not try to abstract every possible machine leature
(like llvm), but rather "objectifies" the risc view to provide what is needed for soml, the
next layer up.
<br/>
The machine has it's own (abstract) instruction set, and the mapping to arm is quite
straightforward. Since the instruction set is implemented as derived classes, additional
instructions may be defined and used later, as long as translation is provided for them too.
In other words the instruction set is extensible (unlike cpu instruction sets).
</p>
<p>
Basic object oriented concepts are needed already at this level, to be able to generate a whole
self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called
parfait and will be coded in soml eventually. But since it is the same objects at runtime and
compile time, it will then be translated back to ruby for use at compile time. Currenty there
are two versions of the code, in ruby and soml, being hand synchronized. More about parfait below.
</p>
<p>
Since working with at this low machine level (essentially assembler) is not easy to follow for
everyone, an interpreter was created. Later a graphical interface, a kind of
<a href="https://github.com/salama/salama-debugger"> visual debugger </a> was added.
Visualizing the control flow and being able to see values updated immediately helped
tremendously in creating this layer. And the interpreter helps in testing, ie keeping it
working in the face of developer change.
</p>
</div>
</div>
<div class="row">
<div class="span10">
<h5>Soml, Salama object machine language</h5>
<p>
Soml is probably the larest single part of the system and much more information can be found
<a href="/soml/soml.html"> here </a>.
<br/>
Before soml, a more traditional virtual machine approach was taken and abandoned. The language
is easy to understand and provides a good abstraction, both in terms of object orienteation,
and in terms of how this is expressed in the register model. <br/>
It is like ruby with out the dynamic aspects, but typed. <br/>
In broad strokes it consists off:
<ul>
<li> <b> Parser:</b> Currently a peg parser, though a hand coded one is planned.
The result of which is an AST</li>
<li> <b> Compiler:</b> compiles the ast into a sequence of Register instructions.
and runtime objects (classes, methods etc)</li>
<li> <b> Parfait: </b> Is the runtime, ie the minimal set of objects needed to
create a binary with the required information to be dynamic</li>
<li> <b> Builtin: </b> A very small set of primitives that are impossible to express
in soml (remembering that parfait will be expressed in soml eventually)</li>
</ul>
</p>
</div>
</div>
<div class="row">
<div class="span10">
<h5>Salama</h5>
<p>
</p>
</p>
</div>
</div>
</div>
<div class="row">
<div class="span12">
<h5>Parfait</h5>
<p>
Ruby is very dynamic, and so it has a relatively large run-time. Parfait is that Run-time.
<br/>
Parfait includes all the functionality a ruby program could not do without, Array, Hash, Object, Class, etc.
<br/>
Parfait does not include any stdlib or indeed core functionality if it doesn't have too.
<br/>
Parfait is coded in ruby, but not all functionality can be coded in ruby, so there is Builtin
</p>
</div>
</div>
<div class="row">
<div class="span12">
<h5>Builtin</h5>
<p>
Builtin is the part of the vm that can not be coded in ruby. It is not, as may be imagined, a set of instructions,
but rather a set of modules.
<br/>
Modules of Builtin have functions that implement functionality that can not be coded in ruby. Ie array access.
The functions take a VM::Method and provide the code as a set of instructions. This may be seen as the assembler
layer if the vm.
</p>
</div>
</div>