110 lines
5.4 KiB
Markdown
Raw Normal View History

2016-12-19 18:56:35 +02:00
---
2017-01-02 01:45:44 +02:00
layout: rubyx
title: RubyX architectural layers
2016-12-19 18:56:35 +02:00
---
## Main Layers
To implement an object system to execute object oriented languages takes a large system.
The parts or abstraction layers are detailed below.
It is important to understand the approach first though, as it differs from the normal
2017-09-05 20:54:28 +03:00
interpretation. The idea is to **compile** ruby. The argument is often made that
typed languages are faster, but i don't believe in that. I think dynamic languages
just push more functionality into the "virtual machine" and it is in fact only the
compiling to binaries that gives static languages their speed. This is the reason
2016-12-19 18:56:35 +02:00
to compile ruby.
2017-09-05 20:54:28 +03:00
![Architectural layers](/assets/layers.jpg)
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
### Ruby
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
To compile and run ruby, we first need to parse ruby. While parsing ruby is quite
a difficult task, it has already been implemented in pure ruby
[here](https://github.com/whitequark/parser). The output of the parser is
an ast, which holds information about the code in instances of a single *Node* class.
Nodes have a type (which you sometimes see in s-expressions) and a list of children.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
There are two basic problems when working with ruby ast: one is the a in ast, the other is ruby.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
Since an abstract syntax tree only has one base class, one needs to employ the visitor
pattern to write a compiler. This ends up being one great class with lots of unrelated
functions, removing much of the benefit of OO.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
The second, possibly bigger problem, is ruby itself: Ruby is full of programmer happiness,
three ways to do this, five to do that. To simplify that, remove the duplication and
make analyis easier, Vool was created.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
### Virtual Object Oriented Language
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
Virtual, in this context, means that there is no syntax for this language; it is an
intermediate representation which *could* be targeted by several languages.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
The main purpose is to simplify existing oo languages down to it's core components: mostly
calling, assignment, continuations and exceptions. Typed classes for each language construct
exist and make it easier to transform a statement into a lower level representations.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
Examples for things that exist in ruby but are broken down in Vool are *unless* , ternary operator,
do while or for loops and other similar syntactic sugar.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
### Minimal Object machine
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
We compile Vool statements into Mom instructions. Mom is a machine, which means it has
instructions. But unlike a cpu (or the risc layer below) it does not have memory, only objects.
It also has no registers, and together these two things mean that all information is stored in
objects. Also the calling convention is object based and uses Frame and Message instances to
save state.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
Objects are typed, and are in fact the same objects the language operates on. Just the
functionality is expressed through instructions. Methods are in fact defined (as vool) on classes
and then compiled to Mom/Risc/Arm and the results stored in the method object.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
Compilation to Mom happens in two stages:
1. The linear statements/code is translated to Mom instructions.
2. Control statements are translated to jumps and labels.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
The second step leaves a linked list of machine instructions as the input for the next stage.
In the future a more elaborate system of optimisations is envisioned between these stages.
2016-12-19 18:56:35 +02:00
2017-09-05 20:54:28 +03:00
### Risc
2016-12-19 18:56:35 +02:00
The Register machine layer is a relatively close abstraction of risc hardware, but without the
quirks.
2017-09-05 20:54:28 +03:00
The Risc machine has registers, indexed addressing, operators, branches and everything
2017-08-29 18:02:04 +03:00
needed for the next layer. It does not try to abstract every possible machine feature
2017-09-05 20:54:28 +03:00
(like llvm), but rather "objectifies" the general risc view to provide what is needed for
the Mom layer, the next layer up.
2016-12-19 18:56:35 +02:00
The machine has it's own (abstract) instruction set, and the mapping to arm is quite
straightforward. Since the instruction set is implemented as derived classes, additional
instructions may be defined and used later, as long as translation is provided for them too.
In other words the instruction set is extensible (unlike cpu instruction sets).
Basic object oriented concepts are needed already at this level, to be able to generate a whole
self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called
2017-01-02 01:45:44 +02:00
parfait, and the same objects will be used at runtime and compile time.
2016-12-19 18:56:35 +02:00
Since working with at this low machine level (essentially assembler) is not easy to follow for
2017-09-05 20:54:28 +03:00
everyone (me :-), an interpreter was created (by me:-). Later a graphical interface, a kind of
2017-01-02 01:45:44 +02:00
[visual debugger](https://github.com/ruby-x/rubyx-debugger) was added.
2016-12-19 18:56:35 +02:00
Visualizing the control flow and being able to see values updated immediately helped
tremendously in creating this layer. And the interpreter helps in testing, ie keeping it
working in the face of developer change.
### Binary , Arm and Elf
A physical machine will run binaries containing instructions that the cpu understands, in a
format the operating system understands (elf). Arm and elf subdirectories hold the code for
these layers.
Arm is a risc architecture, but anyone who knows it will attest, with it's own quirks.
For example any instruction may be executed conditionally in arm. Or there is no 32bit
register load instruction. It is possible to create very dense code using all the arm
special features, but this is not implemented yet.
All Arm instructions are (ie derive from) Register instruction and there is an ArmTranslator
that translates RegisterInstructions to ArmInstructions.