diff --git a/index.html b/index.html index e42f9b9..39a7663 100755 --- a/index.html +++ b/index.html @@ -3,10 +3,11 @@ layout: site ---
-

A completely object oriented virtual machine

+

A completely object oriented machine

- Leaving the old (c) world behind to go where no machine has gone before (or something like that) + A fully self describing object system without external dependencies capable of executing dynamic + object oriented languages like ruby or python.

@@ -14,61 +15,62 @@ layout: site
-

Architecture

-

- Salama is maybe the first successful attempt at writing a virtual machine without the use - of c or c tools. - It defines and implements an object virtual machine completely in object oriented terms, - using ruby to bootstrap itself. +

Goal

+

+ The goal is to execute object oriented code without external dependencies, on modern hardware.

-

- Just some of the features, most of which would not be possible in c: -

- Salama defines is's own machine language (soml) to bridge the gap between the higher language - (ruby) and assembler. Both soml and assembler can be seens as layers towards the final - binary executables +

+ No external dependencies means a system that defines an object oriented system language + that compiles to assembler. A sort of object version of c, but without using c. +

+

+ It must be possible to compile higher level, dynamic, object oriented languages into this + language, in a similar way that c++ is compiled into c (at least used to be). So ruby compiles + to soml which compiles to assembler which compiles to binaries. No interpretation. +

+

+ Most of the system is defined in a higher level language (ruby) and only a small runtime, + mostly for operating system acccess, needs to be written in the system language.

Status

+

+ A first version of the system language is now done.. + The staticaly typed language is called SOML (salama object machine language), has a roughly + ruby-ish syntax while c-ish semantics, and introduces several new concept: +

+

- While the project is just getting on two years, it is starting to settle conceptually, - progress smoothly, and produce working binaries. + An abstract risc like register level defines some abstraction from the actual hardware. The + compiler compiles to this level, but a mapping to Arm is provided to produce working binaries.

-

- In numbers, there are over 1000 commits, 6 sub-projects, more than 10k lines of code - and well over 600 tests. -

-

- Maybe more importantly there is good documentation along with an - evolved idea of how most of the difficult issues are solved. So while the executables are - still of the "Hello world" quality, there are no coneptual problems anymore. -

-

+

There is also an interpreter (mostly for testing) and a basic visual debugger which not only helps debugging, but also understanding of the machine.

-

Docs

The short introduction is under the architecture menu. +

+

+ The section on SOML gives an overview of the system language.

- The full documentation is in form of a gitbook and can be viewed , - and edited + The full documentation is in form of a gitbook and can be viewed here.

The about section has some info of when and how this diff --git a/salama/layers.html b/salama/layers.html index 89b732a..7f9ef89 100644 --- a/salama/layers.html +++ b/salama/layers.html @@ -1,148 +1,162 @@ --- layout: salama -title: Salama, a simple and minimal oo machine +title: Salama architectural layers --- -

-
-

Salama layers

-

Just a small primer (left over from the start), really the book is the best starting point

-
-
-
-
-
Machine Code, bare metal
-

- This is the easy to understand part, that's why it's first, it's the code - from salama-arm , which is quite stable. - It creates binary code according to the arm specs. All about shifting bits in the - right way. -
- As an abstraction it is not far away from assembler. I mapped the memnonics to function calls and the registers - can be symbols or Values (from vm). But on the whole this is as low level as it gets. -
- Different types of instructions are implemented by different classes. To make machine dependant code possible, - those classes are derived from Vm versions. -
- There is an intel directory which contains an expanded version of wilson, but it has yet to be made to fit into - the architecture. So for now salama produces arm code. -
- There is an elf directory wich builds actual executables, a mini implementation of the elf standard. -

-
-
- -
-
-
Parsing, forever descending
+
+

Main Layers

- Parsing is relatively straightforward too, it's code is found in - it's own repository . - It parses more than can be processed, but much less than ruby is. -
- We all know ruby, so it's just a matter of getting the rules right. - If only! Ruby is full of niceties that actually make parsing it quite difficult. But at the moment that story - hasn't even started. -
- Traditionally, yacc or bison or talk of lr or ll would come in here and all but a few would zone out. But llvm has - proven that recursive descent parsing is a viable alternative, also for big projects. And Parslet puts that into - a nice ruby framework for us. -
- Parslet lets us use modules for parts of the parser, so those files are pretty self-explanitory. - Not all is done, but a good start. -
- Parslet also has a seperate Transformation pass, and that creates the AST. Those class names are also - easy, so you can guess what an IfExpression represents. -
+ To implement an object system to execute object oriented languages takes a large system. + The parts or abstraction layers are detailed below.
+ It is important to undrstand the approach first though, as it differs from the normal + interpretation. The idea is to compile (eg) ruby. It may be easiest to compare to a static + object oriented language like c++. When c++ was created c++ code was translated into c, which + then gets translated into assembler, which gets translated to binary code, which is linked + and executed. Compiling to binaries is what gives these languages speed, and is one reason + to compile ruby.
+ In a similar way to the c++ example, we need language between ruby and assembler, as it is too + big a mental step from ruby to assembler. Off course course one could try to compile to c, but + since c is not object oriented that would mean dealing with all off c's non oo heritance, like + linking model, memory model, calling convention etc. (more on this in the book)
+ The layers are: +

    +
  • Binary and cpu specific assembler. This includes arm assembly and elf support + to produce a binary that can then read in ruby programs
  • +
  • Risc register machine abstraction provides a level of machine abstraction, but + as the name says, quite a simle one.
  • +
  • Soml, Salama object machine language, which is like our object c. Statically + typed object oriented with object oriented call sematics.
  • +
  • Salama , which is the layer compiling ruby code into soml and includes + bootstraping code
  • +

-
-
Virtual Machine
+
+
Binary , Arm and Elf

- The Virtual machine layer is where it gets interesting, but also a little fuzzy. -
- After some trying around the virtual machine layer has become a completely self contained layer to describe and - implement an oo machine. In other words it has no reference to any physical machine, that is the next layer down. -
- One can get headaches quite easily while thinking about implementing an oo machine in oo, it's just so difficult to - find the boundaries. To determine those, i like to talk of types (not classes) for the objects (values) in which the - vm is implemented. Also it is neccessary to remove ambiguity about what message sending means. -
- One way to think of this (helps to keep sane) is to think of the types of the system known at compile time. In the - simplest case this could be object reference and integer. The whole vm functionality can be made to work with only - those two types, and it is not specified how the type information is stored. but off course there needs to be a - way to check it at run-time. -
- The vm has an instruction set that, apart from basic integer manipulation, only alows for memory access into an - object. Instead of an implicit stack, we use activation frames and store all variables explicitly. + A physical machine will run binaries containing intructions that the cpu understands. With arm + being our main target, this means we need code to produce binary, which is contained in a + seperate module salama-arm .
+ To be able to run code on a unix based operating system, binaries need to be packaged in a + way that the os understands, so minimal elf support is included in the package.
+ Arm is a risc architecture, but anyone who knows it will attest, with it's own quirks. + For example any instruction may be executed conditionally in arm. Or there is no 32bit + register load instruction. It is possible to create very dense code using all the arm + special features, but this is not implemented yet.

-

-
-
Compilation in passes
-

- Compilation happens in Passes. A single pass is a small piece of code to do just a very small part of the - whole compilation. -
- Logically there are four distinct steps. From the parsed AST we compile a datastructure that includes instructions - for an object machine. The next step is a (still abstract) register machine, before the actual binary for the - arm is generated. -

-

-
-
- -
-
+
Register Machine

- The Register machine layer is a relatively close abstraction of hardware. + The Register machine layer is a relatively close abstraction of risc hardware, but without the + quirks.
- The step from OO machine to Arm had proved to large, also partially due to the cryptic arm names. + The register machine has registers, indexed addressing, operators, branches and everything + needed for the next layer. It doesn not try to abstract every possible machine leature + (like llvm), but rather "objectifies" the risc view to provide what is needed for soml, the + next layer up.
- The register machine has registers, indexed addressing, a pc and all the sort of normal things one would expect. - The machine has it's own (abstract) instruction set, which serves mainly to give understandable names. -
- The mapping to arm is quite straightforward. + The machine has it's own (abstract) instruction set, and the mapping to arm is quite + straightforward. Since the instruction set is implemented as derived classes, additional + instructions may be defined and used later, as long as translation is provided for them too. + In other words the instruction set is extensible (unlike cpu instruction sets).

+

+ Basic object oriented concepts are needed already at this level, to be able to generate a whole + self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called + parfait and will be coded in soml eventually. But since it is the same objects at runtime and + compile time, it will then be translated back to ruby for use at compile time. Currenty there + are two versions of the code, in ruby and soml, being hand synchronized. More about parfait below. +

+

+ Since working with at this low machine level (essentially assembler) is not easy to follow for + everyone, an interpreter was created. Later a graphical interface, a kind of + visual debugger was added. + Visualizing the control flow and being able to see values updated immediately helped + tremendously in creating this layer. And the interpreter helps in testing, ie keeping it + working in the face of developer change. +

+
+
+ +
+
+
Soml, Salama object machine language
+

+ Soml is probably the larest single part of the system and much more information can be found + here . +
+ Before soml, a more traditional virtual machine approach was taken and abandoned. The language + is easy to understand and provides a good abstraction, both in terms of object orienteation, + and in terms of how this is expressed in the register model.
+ It is like ruby with out the dynamic aspects, but typed.
+ In broad strokes it consists off: +

    +
  • Parser: Currently a peg parser, though a hand coded one is planned. + The result of which is an AST
  • +
  • Compiler: compiles the ast into a sequence of Register instructions. + and runtime objects (classes, methods etc)
  • +
  • Parfait: Is the runtime, ie the minimal set of objects needed to + create a binary with the required information to be dynamic
  • +
  • Builtin: A very small set of primitives that are impossible to express + in soml (remembering that parfait will be expressed in soml eventually)
  • +
+

+

+ Just to summarize a few of soml features that are maybe unusual: +

    +
  • Mesage based calling: Calling is completely object oriented (not stack based) + and uses Message and Frame objects.
  • +
  • Return addresses: A soml method call may return to several addresses, according + to type, and in case of exception
  • +
  • Overloaded arguments A method is defined by name, but may have several + implementations for different types of the arguments (statically matched)
  • +

-
-
Parfait
+
+
Salama

- Ruby is very dynamic, and so it has a relatively large run-time. Parfait is that Run-time. + To compile and run ruby, we need to parse and compile ruby code. To compile ruby to soml a clear + mapping has to be achieved. Particularly the dynamic aspects, and typing need to be addressed.
- Parfait includes all the functionality a ruby program could not do without, Array, Hash, Object, Class, etc. -
- Parfait does not include any stdlib or indeed core functionality if it doesn't have too. -
- Parfait is coded in ruby, but not all functionality can be coded in ruby, so there is Builtin + While parsing ruby is quite a difficult task, it has already been implemented in pure ruby + here . The output of the parser is again + an ast, which needs to be compiled to soml.
+ The dynamic aspects of ruby are actually realtively easy to handle, once the whole system is + in place, because the whole system is written in ruby without external dependencies. + Since (when finished) it can compile ruby, it can do so to produce a binary. This binary can + then contain the whole of the system, and so the resulting binary will be able to produce + binary code when it runs. With small changes to the linking process (easy in ruby!) it can + then extend itself. +

+

+ The type aspect is more tricky: Ruby is not typed and soml is after all. And if everything + were objects (as we like to pretend in ruby) we could just do a lot of dynamic checking, + possibly later introduce some caching. But everything is not an object, minimally integers + are not, but maybe also floats and other values. The destinction between what is an integer + and what an object has sprouted an elaborate type system, which is (by necessity) present in + soml (see there). +

+

+ The idea (because it hasn't been implemented yet) is to have different functions for different + types. The soml layer defines object layout and types and also lets us return to different + places from a function (in effect a soml function call is like an if). By using this, we can + compile a single ruby method into several soml methods. Each such method is typed, ie all + arguments and variables are of known type. According to these types we can call methods according + to their signatures. Also we can autognerate error methods for unhandled types, and predict + that only a fraction of the possible combinations will actually be needed.

- -
-
-
Builtin
-

- Builtin is the part of the vm that can not be coded in ruby. It is not, as may be imagined, a set of instructions, - but rather a set of modules. -
- Modules of Builtin have functions that implement functionality that can not be coded in ruby. Ie array access. - The functions take a VM::Method and provide the code as a set of instructions. This may be seen as the assembler - layer if the vm. -

-
-
- diff --git a/soml/layers.html b/soml/layers.html new file mode 100644 index 0000000..28c203d --- /dev/null +++ b/soml/layers.html @@ -0,0 +1,154 @@ +--- +layout: salama +title: Salama architectural layers +--- + + +
+
+

Main Layers

+

+ To implement an object system to execute object oriented languages takes a large system. + The parts or abstraction layers are detailed below.
+ It is important to undrstand the approach first though, as it differs from the normal + interpretation. The idea is to compile (eg) ruby. It may be easiest to compare to a static + object oriented language like c++. When c++ was created c++ code was translated into c, which + then gets translated into assembler, which gets translated to binary code, which is linked + and executed. Compiling to binaries is what gives these languages speed, and is one reason + to compile ruby.
+ In a similar way to the c++ example, we need language between ruby and assembler, as it is too + big a mental step from ruby to assembler. Off course course one could try to compile to c, but + since c is not object oriented that would mean dealing with all off c's non oo heritance, like + linking model, memory model, calling convention etc. (more on this in the book)
+ The layers are: +

    +
  • Binary and cpu specific assembler. This includes arm assembly and elf support + to produce a binary that can then read in ruby programs
  • +
  • Risc register machine abstraction provides a level of machine abstraction, but + as the name says, quite a simle one.
  • +
  • Soml, Salama object machine language, which is like our object c. Statically + typed object oriented with object oriented call sematics.
  • +
  • Salama , which is the layer compiling ruby code into soml and includes + bootstraping code
  • +
+

+
+
+ +
+
+
Binary , Arm and Elf
+

+ A physical machine will run binaries containing intructions that the cpu understands. With arm + being our main target, this means we need code to produce binary, which is contained in a + seperate module salama-arm .
+ To be able to run code on a unix based operating system, binaries need to be packaged in a + way that the os understands, so minimal elf support is included in the package.
+ Arm is a risc architecture, but anyone who knows it will attest, with it's own quirks. + For example any instruction may be executed conditionally in arm. Or there is no 32bit + register load instruction. It is possible to create very dense code using all the arm + special features, but this is not implemented yet. +

+
+
+ +
+
+
Register Machine
+

+ The Register machine layer is a relatively close abstraction of risc hardware, but without the + quirks. +
+ The register machine has registers, indexed addressing, operators, branches and everything + needed for the next layer. It doesn not try to abstract every possible machine leature + (like llvm), but rather "objectifies" the risc view to provide what is needed for soml, the + next layer up. +
+ The machine has it's own (abstract) instruction set, and the mapping to arm is quite + straightforward. Since the instruction set is implemented as derived classes, additional + instructions may be defined and used later, as long as translation is provided for them too. + In other words the instruction set is extensible (unlike cpu instruction sets). +

+

+ Basic object oriented concepts are needed already at this level, to be able to generate a whole + self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called + parfait and will be coded in soml eventually. But since it is the same objects at runtime and + compile time, it will then be translated back to ruby for use at compile time. Currenty there + are two versions of the code, in ruby and soml, being hand synchronized. More about parfait below. +

+

+ Since working with at this low machine level (essentially assembler) is not easy to follow for + everyone, an interpreter was created. Later a graphical interface, a kind of + visual debugger was added. + Visualizing the control flow and being able to see values updated immediately helped + tremendously in creating this layer. And the interpreter helps in testing, ie keeping it + working in the face of developer change. +

+
+
+ +
+
+
Soml, Salama object machine language
+

+ Soml is probably the larest single part of the system and much more information can be found + here . +
+ Before soml, a more traditional virtual machine approach was taken and abandoned. The language + is easy to understand and provides a good abstraction, both in terms of object orienteation, + and in terms of how this is expressed in the register model.
+ It is like ruby with out the dynamic aspects, but typed.
+ In broad strokes it consists off: +

    +
  • Parser: Currently a peg parser, though a hand coded one is planned. + The result of which is an AST
  • +
  • Compiler: compiles the ast into a sequence of Register instructions. + and runtime objects (classes, methods etc)
  • +
  • Parfait: Is the runtime, ie the minimal set of objects needed to + create a binary with the required information to be dynamic
  • +
  • Builtin: A very small set of primitives that are impossible to express + in soml (remembering that parfait will be expressed in soml eventually)
  • +
+

+
+
+ +
+
+
Salama
+

+

+

+
+
+ +
+ +
+
+
Parfait
+

+ Ruby is very dynamic, and so it has a relatively large run-time. Parfait is that Run-time. +
+ Parfait includes all the functionality a ruby program could not do without, Array, Hash, Object, Class, etc. +
+ Parfait does not include any stdlib or indeed core functionality if it doesn't have too. +
+ Parfait is coded in ruby, but not all functionality can be coded in ruby, so there is Builtin +

+
+
+ +
+
+
Builtin
+

+ Builtin is the part of the vm that can not be coded in ruby. It is not, as may be imagined, a set of instructions, + but rather a set of modules. +
+ Modules of Builtin have functions that implement functionality that can not be coded in ruby. Ie array access. + The functions take a VM::Method and provide the code as a set of instructions. This may be seen as the assembler + layer if the vm. +

+
+