improve readmes

This commit is contained in:
Torsten Ruger 2014-08-28 19:12:46 +03:00
parent 1fc3f1fb18
commit 35b738639b
5 changed files with 133 additions and 49 deletions

View File

@ -21,38 +21,50 @@ transformation and optimisation passes on the stream to improve it.
Each ast class gets a compile method that does the compilation.
#### Method Definition and Instructions
#### Compiled Method and Instructions
The first argument to the compile method is the CompiledMethod. All code is encoded as a stream of Instructions in the
CompiledMethod. In fact Instructions are a linked list and so the CompiledMethod only hold the head, and the current
insertion point.
CompiledMethod. Instructions are stored as a list of Blocks, and Blocks are the smallest unit of code, which is
always linear.
Code is added to the method (using add()), rather than working with the actual instructions. This is so each compile method
can just do it's bit and be unaware of the larger structure that is being created. The genearal structure of the instructions
is a graph (what with if's and whiles and breaks and what), but we build it to have one start and *one* end (return).
Code is added to the method (using add_code), rather than working with the actual instructions. This is so each
compiling method can just do it's bit and be unaware of the larger structure that is being created.
The genearal structure of the instructions is a graph (what with if's and whiles and breaks and what),
but we build it to have one start and *one* end (return).
#### Messages and frames
The virtual machine instructions obviously operate on the virtual machine. Since the machine is virtual, we have to define
it, and since it is oo we define it in objects.
The virtual machine instructions obviously operate on the virtual machine. Since the machine is virtual,
we have to define it, and since it is oo we define it in objects.
Also it is important to define how instructions, which is is in a ohysical machine by changing the contents of registers or
some stack.
Also it is important to define how instructions operate, which is is in a physical machine would be by changing
the contents of registers or some stack.
Our machine is ot a register machine, but an object machine: it operates directly on objects and also has no stack.
Our machine is ot a register machine, but an object machine: it operates directly on objects and also has no seperat
stack, only objects. There are a number of objects which are accessible, and one can think of these (their addresses)
as register contents. And one wouldn't be far off as that is the implementation
When a Method needs to make a call, or send a message, it creates a Message object. Messages contain return addresses and
arguemnts.
The objects the machine works on are:
Then the machine must find the method to call.
- Message
- Frame
- Self
- NewMessage
and working on means, these are the only objects which the machine accesses. Ie all others would have to be moved first.
When a Method needs to make a call, or send a message, it creates a new Message object.
Messages contain return addresses and arguemnts.
Then the machine must find the method to call. This is a function of the virtual machine an is implemented in ruby.
Then a new Method receives the message, creates a Frame for local and temporary variables and continues execution.
The important thing here is that Messages and Frames are normal objects.
And interestingly we can partly use ruby to find the method, so in a way it is not just a top down transformation. but
the sending goes back up and then down again.
And interestingly we can partly use ruby to find the method, so in a way it is not just a top down transformation.
Instead the sending goes back up and then down again.
The Message object is the second parameter to the compile method, the run-time part as it were. Why? Since it only
exists at runtime: to make compile time analysis possible. Especially for those times when we can resolve the method

View File

@ -11,7 +11,9 @@ And i finally came to the conclusion that Parfait is the ruby runtime. Aha
Run - time
not compile - time
not
compile - time
always mixing those up: As such it is not loaded at compile time.
@ -22,17 +24,20 @@ And thus parfait can be used at run-time.
It's too simple: just slips off the mind like a fish into water.
Parfait has a brother, the Builtin module. Builtin contains everything that can not be coded in ruby, but we stil need
(things like array access).
#### Example: Message send
I felt a little stupid that it took me so long to notice that sending a message is very closely relateed to the
It felt a little stupid that it took me so long to notice that sending a message is very closely related to the
existing ruby method Object.send
Off course object.send takes symbol and the arguments and has the receiver, so all the elements of our Messaage are there.
Off course Object.send takes symbol and the arguments and has the receiver, so all the elements of our Messaage are there.
And the process that Object.send needs to do is exactly that: send that message, ie find the correct method according to
the old walk up the inheritance tree rules and dispatch it.
And as all this happens at runtime, "all" we have to do is code this logic. And since it is at runtime, we can do it in ruby
(as i said, this get's compiled and run, just like the program).
And as all this happens at runtime, "all" we have to do is code this logic. And since it is at runtime,
we can do it in ruby (as i said, this get's compiled and run, just like the program).
But what about the infinite loop problem:

View File

@ -1,36 +1,49 @@
Register Machine
===============
This is the logic that uses the generated ast to produce code, using the asm layer.
This is the logic that uses the compiled virtual object space to produce code and an executable binary.
Apart from shuffeling things around from one layer to the other, it keeps track about registers and
provides the stack glue. All the stuff a compiler would usually do.
There is a mechanism for an actual machine (derived class) to generate machine specific instructions (as the
plain ones in this directory don't assemble to binary). Currently there is only the Arm module to actually do
that.
Also all syscalls are abstracted as functions.
The elf module is used to generate the actual binary from the final BootSpace. BootSpace is a virtual class representing
all objects that will be in the executable. Other than CompiledMethods, objects get transformed to data.
The Salama Convention
----------------------
But CompiledMethods, which are made up of Blocks, are compiled into a stream of bytes, which are the binary code for the
function.
Since we're not in c, we use the regsters more suitably for our job:
Virtual Objects
----------------
- return register is _not_ the same as passing registers
- we pin one more register (ala stack/fp) for type information (this is used for returns too)
- one line (8 registers) can be used by a function (caller saved?)
- rest are scratch and may not hold values during call
There are four virtual objects that are accessible (we can access their variables):
For Arm this works out as:
- 0 type word (for the line)
- 1-6 argument passing + workspace
- 7 return value
- Self
- Message (arguments, method name, self)
- Frame (local and tmp variables)
- NewMessage ( to build the next message sent)
This means syscalls (using 7 for call number and 0 for return) must shuffle a little, but there's space to do it.
Some more detail:
These are pretty much the first four registers. When the code goes from virtual to register, we use register instrucitons
to replace virtual ones.
1 - returning in the same register as passing makes that one register a special case, which i want to avoid. shuffling it gets tricky and involves 2 moves for what?
As i see it the benefitd of reusing the same register are one more argument register (not needed) and easy chaining of calls, which doen't really happen so much.
On the plus side, not using the same register makes saving and restoring registers easy (to implement and understand!).
An easy to understand policy is worth gold, as register mistakes are HARD to debug and not what i want to spend my time with just now. So that's settled.
Eg: A Virtual::Set can move data around inside those objects. And since in Arm this can not be done in one instruciton,
we use two, one to move to an unused register and then into the destination. And then we need some fiddling of bits
to shift the type info.
2 - Tagging integers like MRI/BB is a hack which does not extend to other types, such as floats. So we don't use that and instead carry type information externally to the value. This is a burden off course, but then so is tagging.
The convention (to make it easier) is to handle data in lines (8 words) and have one of them carry the type info for the other 7. This is also the object layout and so we reuse that code on the stack.
Another simple example is a Call. A simple case of a Class function call resolves the class object, and with the
method name the function to be called at compile-time. And so this results in a Register::Call, which is an Arm
instruction.
A C call
---------
Ok, there are no c calls. But syscalls are very similar. This is not at all as simple as the nice Class call described
above.
For syscall in Arm (linux) you have to load registers 0-x (depending on call), load R7 with the syscall number and then
issue the software interupt instruction. If you get back something back, it's in R0.
In short, lots of shuffling. And to make it fit with our four object architecture, we need the Message to hold the data
for the call and Sys (module) to be self. And then the actual functions do the shuffle, saving the data and restoring it.
And setting type information according to kernel documentation (as there is no runtime info)

View File

@ -6,10 +6,10 @@ Hence the need for a code/object file format (remember an oo program is just obj
I started with yaml, which is nice in that it has a solid implementation, reads and writes, handles arbitrary objects, handles graphs and is a sort of readable text format.
But the sort of started to get to me, because
1) it's way to verbose (long files, object groups over many pages) and
2) does not allow for (easy) ordering.
Also it was placing references in weird (first seen) places.
But the "sort of" started to get to me, because
- 1) it's way to verbose (long files, object groups over many pages) and
- 2) does not allow for (easy) ordering.
To fix this i started on Sof, with an eye to expand it.
@ -19,7 +19,7 @@ The main starting goal was quite like yaml, but with
- also short versions of arrays and hashes
- Shorter class names (no ruby/object or even ruby/struct stuff)
- references at the most shallow level
- a possibility to order attributes and specify attributes that should not be serialized
- an easy way to order attributes and specify attributes that should not be serialized
### Salama Object File

54
lib/virtual/README.md Normal file
View File

@ -0,0 +1,54 @@
### Virtual OO Machine
This is really an OV (object value) not object oriented machine.
Integers and References are Values. We make them look like objects, sure, but they are not.
Symbols have similar properties and those are:
- eqality means identity
- no chhange over lifetime
It's like with Atoms: they used to be the smallest possible physical unit. Now we have electrons, proton and neutrons.
And so objects are made up of Values (not objects), integers, floats , references and possibly more.
Values have type in the same way objects have a class. We keep track of the type at runtime.
### Layers
*Ast* instances get created by the salama-reader gem from source. Here we add compile functions to ast classes and
comile the ast layer into Virtual:: objects
The main objects are BootSpace (lots of objects), BootClass (represents a class),
CompiledMethod (with Blocks and Instruction).
*Virtual* Instructions get further transformed into *register* instructions. This is done by an abstractly defined
Register Machine with basic Intructions. A concrete implementation (like Arm) derives and creates derived
Instructions.
The final transformation assigns Positions to all boot objects (Linker) and assembles them into a binary representation.
The data- part is then a representation of classes in the *parfait* runtime. And the instrucions make up the
funtions.
### Accessible Objects
Object oriented systems have data hiding. So we have access to the inner state of only four objects:
- Self
- Message (arguments, method name, self)
- Frame (local and tmp variables)
- NewMessage ( to build the next message sent)
A single instructions (Set) allows movement of data between these. There are compare, branch and call intructions too.
### Micro
The micro-kernel idea is well stated by: If you can leave it out, do.
As such we are aiming for integer and reference (type) support, and a minimal class system
(object/class/aray/hash/string).
*Parfait* is that part of the runtime that can be coded in ruby. It is parsed, like any other code and always included
in the resulting binary. Builtin is the part of the runtime that can not be coded in ruby (but is still needed). This
is coded y construction CompiledMethods in code and neccesarily machine dependant.