diff --git a/index.html b/index.html index e42f9b9..39a7663 100755 --- a/index.html +++ b/index.html @@ -3,10 +3,11 @@ layout: site ---
- Leaving the old (c) world behind to go where no machine has gone before (or something like that) + A fully self describing object system without external dependencies capable of executing dynamic + object oriented languages like ruby or python.
- Salama is maybe the first successful attempt at writing a virtual machine without the use - of c or c tools. - It defines and implements an object virtual machine completely in object oriented terms, - using ruby to bootstrap itself. +
+ The goal is to execute object oriented code without external dependencies, on modern hardware.
-- Just some of the features, most of which would not be possible in c: -
+ No external dependencies means a system that defines an object oriented system language + that compiles to assembler. A sort of object version of c, but without using c. +
++ It must be possible to compile higher level, dynamic, object oriented languages into this + language, in a similar way that c++ is compiled into c (at least used to be). So ruby compiles + to soml which compiles to assembler which compiles to binaries. No interpretation. +
++ Most of the system is defined in a higher level language (ruby) and only a small runtime, + mostly for operating system acccess, needs to be written in the system language.
+ A first version of the system language is now done.. + The staticaly typed language is called SOML (salama object machine language), has a roughly + ruby-ish syntax while c-ish semantics, and introduces several new concept: +
- While the project is just getting on two years, it is starting to settle conceptually, - progress smoothly, and produce working binaries. + An abstract risc like register level defines some abstraction from the actual hardware. The + compiler compiles to this level, but a mapping to Arm is provided to produce working binaries.
-- In numbers, there are over 1000 commits, 6 sub-projects, more than 10k lines of code - and well over 600 tests. -
-- Maybe more importantly there is good documentation along with an - evolved idea of how most of the difficult issues are solved. So while the executables are - still of the "Hello world" quality, there are no coneptual problems anymore. -
-+
There is also an interpreter (mostly for testing) and a basic visual debugger which not only helps debugging, but also understanding of the machine.
-The short introduction is under the architecture menu. +
++ The section on SOML gives an overview of the system language.
- The full documentation is in form of a gitbook and can be viewed , - and edited + The full documentation is in form of a gitbook and can be viewed here.
The about section has some info of when and how this diff --git a/salama/layers.html b/salama/layers.html index 89b732a..7f9ef89 100644 --- a/salama/layers.html +++ b/salama/layers.html @@ -1,148 +1,162 @@ --- layout: salama -title: Salama, a simple and minimal oo machine +title: Salama architectural layers --- -
Just a small primer (left over from the start), really the book is the best starting point
-
- This is the easy to understand part, that's why it's first, it's the code
- from salama-arm , which is quite stable.
- It creates binary code according to the arm specs. All about shifting bits in the
- right way.
-
- As an abstraction it is not far away from assembler. I mapped the memnonics to function calls and the registers
- can be symbols or Values (from vm). But on the whole this is as low level as it gets.
-
- Different types of instructions are implemented by different classes. To make machine dependant code possible,
- those classes are derived from Vm versions.
-
- There is an intel directory which contains an expanded version of wilson, but it has yet to be made to fit into
- the architecture. So for now salama produces arm code.
-
- There is an elf directory wich builds actual executables, a mini implementation of the elf standard.
-
- Parsing is relatively straightforward too, it's code is found in
- it's own repository .
- It parses more than can be processed, but much less than ruby is.
-
- We all know ruby, so it's just a matter of getting the rules right.
- If only! Ruby is full of niceties that actually make parsing it quite difficult. But at the moment that story
- hasn't even started.
-
- Traditionally, yacc or bison or talk of lr or ll would come in here and all but a few would zone out. But llvm has
- proven that recursive descent parsing is a viable alternative, also for big projects. And Parslet puts that into
- a nice ruby framework for us.
-
- Parslet lets us use modules for parts of the parser, so those files are pretty self-explanitory.
- Not all is done, but a good start.
-
- Parslet also has a seperate Transformation pass, and that creates the AST. Those class names are also
- easy, so you can guess what an IfExpression represents.
-
+ To implement an object system to execute object oriented languages takes a large system.
+ The parts or abstraction layers are detailed below.
+ It is important to undrstand the approach first though, as it differs from the normal
+ interpretation. The idea is to compile (eg) ruby. It may be easiest to compare to a static
+ object oriented language like c++. When c++ was created c++ code was translated into c, which
+ then gets translated into assembler, which gets translated to binary code, which is linked
+ and executed. Compiling to binaries is what gives these languages speed, and is one reason
+ to compile ruby.
+ In a similar way to the c++ example, we need language between ruby and assembler, as it is too
+ big a mental step from ruby to assembler. Off course course one could try to compile to c, but
+ since c is not object oriented that would mean dealing with all off c's non oo heritance, like
+ linking model, memory model, calling convention etc. (more on this in the book)
+ The layers are:
+
- The Virtual machine layer is where it gets interesting, but also a little fuzzy.
-
- After some trying around the virtual machine layer has become a completely self contained layer to describe and
- implement an oo machine. In other words it has no reference to any physical machine, that is the next layer down.
-
- One can get headaches quite easily while thinking about implementing an oo machine in oo, it's just so difficult to
- find the boundaries. To determine those, i like to talk of types (not classes) for the objects (values) in which the
- vm is implemented. Also it is neccessary to remove ambiguity about what message sending means.
-
- One way to think of this (helps to keep sane) is to think of the types of the system known at compile time. In the
- simplest case this could be object reference and integer. The whole vm functionality can be made to work with only
- those two types, and it is not specified how the type information is stored. but off course there needs to be a
- way to check it at run-time.
-
- The vm has an instruction set that, apart from basic integer manipulation, only alows for memory access into an
- object. Instead of an implicit stack, we use activation frames and store all variables explicitly.
+ A physical machine will run binaries containing intructions that the cpu understands. With arm
+ being our main target, this means we need code to produce binary, which is contained in a
+ seperate module salama-arm .
+ To be able to run code on a unix based operating system, binaries need to be packaged in a
+ way that the os understands, so minimal elf support is included in the package.
+ Arm is a risc architecture, but anyone who knows it will attest, with it's own quirks.
+ For example any instruction may be executed conditionally in arm. Or there is no 32bit
+ register load instruction. It is possible to create very dense code using all the arm
+ special features, but this is not implemented yet.
- Compilation happens in Passes. A single pass is a small piece of code to do just a very small part of the
- whole compilation.
-
- Logically there are four distinct steps. From the parsed AST we compile a datastructure that includes instructions
- for an object machine. The next step is a (still abstract) register machine, before the actual binary for the
- arm is generated.
-
- The Register machine layer is a relatively close abstraction of hardware.
+ The Register machine layer is a relatively close abstraction of risc hardware, but without the
+ quirks.
- The step from OO machine to Arm had proved to large, also partially due to the cryptic arm names.
+ The register machine has registers, indexed addressing, operators, branches and everything
+ needed for the next layer. It doesn not try to abstract every possible machine leature
+ (like llvm), but rather "objectifies" the risc view to provide what is needed for soml, the
+ next layer up.
- The register machine has registers, indexed addressing, a pc and all the sort of normal things one would expect.
- The machine has it's own (abstract) instruction set, which serves mainly to give understandable names.
-
- The mapping to arm is quite straightforward.
+ The machine has it's own (abstract) instruction set, and the mapping to arm is quite
+ straightforward. Since the instruction set is implemented as derived classes, additional
+ instructions may be defined and used later, as long as translation is provided for them too.
+ In other words the instruction set is extensible (unlike cpu instruction sets).
+ Basic object oriented concepts are needed already at this level, to be able to generate a whole + self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called + parfait and will be coded in soml eventually. But since it is the same objects at runtime and + compile time, it will then be translated back to ruby for use at compile time. Currenty there + are two versions of the code, in ruby and soml, being hand synchronized. More about parfait below. +
++ Since working with at this low machine level (essentially assembler) is not easy to follow for + everyone, an interpreter was created. Later a graphical interface, a kind of + visual debugger was added. + Visualizing the control flow and being able to see values updated immediately helped + tremendously in creating this layer. And the interpreter helps in testing, ie keeping it + working in the face of developer change. +
+
+ Soml is probably the larest single part of the system and much more information can be found
+ here .
+
+ Before soml, a more traditional virtual machine approach was taken and abandoned. The language
+ is easy to understand and provides a good abstraction, both in terms of object orienteation,
+ and in terms of how this is expressed in the register model.
+ It is like ruby with out the dynamic aspects, but typed.
+ In broad strokes it consists off:
+
+ Just to summarize a few of soml features that are maybe unusual: +
- Ruby is very dynamic, and so it has a relatively large run-time. Parfait is that Run-time.
+ To compile and run ruby, we need to parse and compile ruby code. To compile ruby to soml a clear
+ mapping has to be achieved. Particularly the dynamic aspects, and typing need to be addressed.
- Parfait includes all the functionality a ruby program could not do without, Array, Hash, Object, Class, etc.
-
- Parfait does not include any stdlib or indeed core functionality if it doesn't have too.
-
- Parfait is coded in ruby, but not all functionality can be coded in ruby, so there is Builtin
+ While parsing ruby is quite a difficult task, it has already been implemented in pure ruby
+ here . The output of the parser is again
+ an ast, which needs to be compiled to soml.
+ The dynamic aspects of ruby are actually realtively easy to handle, once the whole system is
+ in place, because the whole system is written in ruby without external dependencies.
+ Since (when finished) it can compile ruby, it can do so to produce a binary. This binary can
+ then contain the whole of the system, and so the resulting binary will be able to produce
+ binary code when it runs. With small changes to the linking process (easy in ruby!) it can
+ then extend itself.
+
+ The type aspect is more tricky: Ruby is not typed and soml is after all. And if everything + were objects (as we like to pretend in ruby) we could just do a lot of dynamic checking, + possibly later introduce some caching. But everything is not an object, minimally integers + are not, but maybe also floats and other values. The destinction between what is an integer + and what an object has sprouted an elaborate type system, which is (by necessity) present in + soml (see there). +
++ The idea (because it hasn't been implemented yet) is to have different functions for different + types. The soml layer defines object layout and types and also lets us return to different + places from a function (in effect a soml function call is like an if). By using this, we can + compile a single ruby method into several soml methods. Each such method is typed, ie all + arguments and variables are of known type. According to these types we can call methods according + to their signatures. Also we can autognerate error methods for unhandled types, and predict + that only a fraction of the possible combinations will actually be needed.
- Builtin is the part of the vm that can not be coded in ruby. It is not, as may be imagined, a set of instructions,
- but rather a set of modules.
-
- Modules of Builtin have functions that implement functionality that can not be coded in ruby. Ie array access.
- The functions take a VM::Method and provide the code as a set of instructions. This may be seen as the assembler
- layer if the vm.
-
+ To implement an object system to execute object oriented languages takes a large system.
+ The parts or abstraction layers are detailed below.
+ It is important to undrstand the approach first though, as it differs from the normal
+ interpretation. The idea is to compile (eg) ruby. It may be easiest to compare to a static
+ object oriented language like c++. When c++ was created c++ code was translated into c, which
+ then gets translated into assembler, which gets translated to binary code, which is linked
+ and executed. Compiling to binaries is what gives these languages speed, and is one reason
+ to compile ruby.
+ In a similar way to the c++ example, we need language between ruby and assembler, as it is too
+ big a mental step from ruby to assembler. Off course course one could try to compile to c, but
+ since c is not object oriented that would mean dealing with all off c's non oo heritance, like
+ linking model, memory model, calling convention etc. (more on this in the book)
+ The layers are:
+
+ A physical machine will run binaries containing intructions that the cpu understands. With arm
+ being our main target, this means we need code to produce binary, which is contained in a
+ seperate module salama-arm .
+ To be able to run code on a unix based operating system, binaries need to be packaged in a
+ way that the os understands, so minimal elf support is included in the package.
+ Arm is a risc architecture, but anyone who knows it will attest, with it's own quirks.
+ For example any instruction may be executed conditionally in arm. Or there is no 32bit
+ register load instruction. It is possible to create very dense code using all the arm
+ special features, but this is not implemented yet.
+
+ The Register machine layer is a relatively close abstraction of risc hardware, but without the
+ quirks.
+
+ The register machine has registers, indexed addressing, operators, branches and everything
+ needed for the next layer. It doesn not try to abstract every possible machine leature
+ (like llvm), but rather "objectifies" the risc view to provide what is needed for soml, the
+ next layer up.
+
+ The machine has it's own (abstract) instruction set, and the mapping to arm is quite
+ straightforward. Since the instruction set is implemented as derived classes, additional
+ instructions may be defined and used later, as long as translation is provided for them too.
+ In other words the instruction set is extensible (unlike cpu instruction sets).
+
+ Basic object oriented concepts are needed already at this level, to be able to generate a whole + self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called + parfait and will be coded in soml eventually. But since it is the same objects at runtime and + compile time, it will then be translated back to ruby for use at compile time. Currenty there + are two versions of the code, in ruby and soml, being hand synchronized. More about parfait below. +
++ Since working with at this low machine level (essentially assembler) is not easy to follow for + everyone, an interpreter was created. Later a graphical interface, a kind of + visual debugger was added. + Visualizing the control flow and being able to see values updated immediately helped + tremendously in creating this layer. And the interpreter helps in testing, ie keeping it + working in the face of developer change. +
+
+ Soml is probably the larest single part of the system and much more information can be found
+ here .
+
+ Before soml, a more traditional virtual machine approach was taken and abandoned. The language
+ is easy to understand and provides a good abstraction, both in terms of object orienteation,
+ and in terms of how this is expressed in the register model.
+ It is like ruby with out the dynamic aspects, but typed.
+ In broad strokes it consists off:
+
+
+ +
+ Ruby is very dynamic, and so it has a relatively large run-time. Parfait is that Run-time.
+
+ Parfait includes all the functionality a ruby program could not do without, Array, Hash, Object, Class, etc.
+
+ Parfait does not include any stdlib or indeed core functionality if it doesn't have too.
+
+ Parfait is coded in ruby, but not all functionality can be coded in ruby, so there is Builtin
+
+ Builtin is the part of the vm that can not be coded in ruby. It is not, as may be imagined, a set of instructions,
+ but rather a set of modules.
+
+ Modules of Builtin have functions that implement functionality that can not be coded in ruby. Ie array access.
+ The functions take a VM::Method and provide the code as a set of instructions. This may be seen as the assembler
+ layer if the vm.
+