improve readmes
This commit is contained in:
parent
1fc3f1fb18
commit
35b738639b
@ -21,38 +21,50 @@ transformation and optimisation passes on the stream to improve it.
|
||||
|
||||
Each ast class gets a compile method that does the compilation.
|
||||
|
||||
#### Method Definition and Instructions
|
||||
#### Compiled Method and Instructions
|
||||
|
||||
The first argument to the compile method is the CompiledMethod. All code is encoded as a stream of Instructions in the
|
||||
CompiledMethod. In fact Instructions are a linked list and so the CompiledMethod only hold the head, and the current
|
||||
insertion point.
|
||||
CompiledMethod. Instructions are stored as a list of Blocks, and Blocks are the smallest unit of code, which is
|
||||
always linear.
|
||||
|
||||
Code is added to the method (using add()), rather than working with the actual instructions. This is so each compile method
|
||||
can just do it's bit and be unaware of the larger structure that is being created. The genearal structure of the instructions
|
||||
is a graph (what with if's and whiles and breaks and what), but we build it to have one start and *one* end (return).
|
||||
Code is added to the method (using add_code), rather than working with the actual instructions. This is so each
|
||||
compiling method can just do it's bit and be unaware of the larger structure that is being created.
|
||||
The genearal structure of the instructions is a graph (what with if's and whiles and breaks and what),
|
||||
but we build it to have one start and *one* end (return).
|
||||
|
||||
|
||||
#### Messages and frames
|
||||
|
||||
The virtual machine instructions obviously operate on the virtual machine. Since the machine is virtual, we have to define
|
||||
it, and since it is oo we define it in objects.
|
||||
The virtual machine instructions obviously operate on the virtual machine. Since the machine is virtual,
|
||||
we have to define it, and since it is oo we define it in objects.
|
||||
|
||||
Also it is important to define how instructions, which is is in a ohysical machine by changing the contents of registers or
|
||||
some stack.
|
||||
Also it is important to define how instructions operate, which is is in a physical machine would be by changing
|
||||
the contents of registers or some stack.
|
||||
|
||||
Our machine is ot a register machine, but an object machine: it operates directly on objects and also has no stack.
|
||||
Our machine is ot a register machine, but an object machine: it operates directly on objects and also has no seperat
|
||||
stack, only objects. There are a number of objects which are accessible, and one can think of these (their addresses)
|
||||
as register contents. And one wouldn't be far off as that is the implementation
|
||||
|
||||
When a Method needs to make a call, or send a message, it creates a Message object. Messages contain return addresses and
|
||||
arguemnts.
|
||||
The objects the machine works on are:
|
||||
|
||||
Then the machine must find the method to call.
|
||||
- Message
|
||||
- Frame
|
||||
- Self
|
||||
- NewMessage
|
||||
|
||||
and working on means, these are the only objects which the machine accesses. Ie all others would have to be moved first.
|
||||
|
||||
When a Method needs to make a call, or send a message, it creates a new Message object.
|
||||
Messages contain return addresses and arguemnts.
|
||||
|
||||
Then the machine must find the method to call. This is a function of the virtual machine an is implemented in ruby.
|
||||
|
||||
Then a new Method receives the message, creates a Frame for local and temporary variables and continues execution.
|
||||
|
||||
The important thing here is that Messages and Frames are normal objects.
|
||||
|
||||
And interestingly we can partly use ruby to find the method, so in a way it is not just a top down transformation. but
|
||||
the sending goes back up and then down again.
|
||||
And interestingly we can partly use ruby to find the method, so in a way it is not just a top down transformation.
|
||||
Instead the sending goes back up and then down again.
|
||||
|
||||
The Message object is the second parameter to the compile method, the run-time part as it were. Why? Since it only
|
||||
exists at runtime: to make compile time analysis possible. Especially for those times when we can resolve the method
|
||||
|
@ -11,7 +11,9 @@ And i finally came to the conclusion that Parfait is the ruby runtime. Aha
|
||||
|
||||
Run - time
|
||||
|
||||
not compile - time
|
||||
not
|
||||
|
||||
compile - time
|
||||
|
||||
always mixing those up: As such it is not loaded at compile time.
|
||||
|
||||
@ -22,17 +24,20 @@ And thus parfait can be used at run-time.
|
||||
|
||||
It's too simple: just slips off the mind like a fish into water.
|
||||
|
||||
Parfait has a brother, the Builtin module. Builtin contains everything that can not be coded in ruby, but we stil need
|
||||
(things like array access).
|
||||
|
||||
#### Example: Message send
|
||||
|
||||
I felt a little stupid that it took me so long to notice that sending a message is very closely relateed to the
|
||||
It felt a little stupid that it took me so long to notice that sending a message is very closely related to the
|
||||
existing ruby method Object.send
|
||||
|
||||
Off course object.send takes symbol and the arguments and has the receiver, so all the elements of our Messaage are there.
|
||||
Off course Object.send takes symbol and the arguments and has the receiver, so all the elements of our Messaage are there.
|
||||
And the process that Object.send needs to do is exactly that: send that message, ie find the correct method according to
|
||||
the old walk up the inheritance tree rules and dispatch it.
|
||||
|
||||
And as all this happens at runtime, "all" we have to do is code this logic. And since it is at runtime, we can do it in ruby
|
||||
(as i said, this get's compiled and run, just like the program).
|
||||
And as all this happens at runtime, "all" we have to do is code this logic. And since it is at runtime,
|
||||
we can do it in ruby (as i said, this get's compiled and run, just like the program).
|
||||
|
||||
But what about the infinite loop problem:
|
||||
|
||||
|
@ -1,36 +1,49 @@
|
||||
Register Machine
|
||||
===============
|
||||
|
||||
This is the logic that uses the generated ast to produce code, using the asm layer.
|
||||
This is the logic that uses the compiled virtual object space to produce code and an executable binary.
|
||||
|
||||
Apart from shuffeling things around from one layer to the other, it keeps track about registers and
|
||||
provides the stack glue. All the stuff a compiler would usually do.
|
||||
There is a mechanism for an actual machine (derived class) to generate machine specific instructions (as the
|
||||
plain ones in this directory don't assemble to binary). Currently there is only the Arm module to actually do
|
||||
that.
|
||||
|
||||
Also all syscalls are abstracted as functions.
|
||||
The elf module is used to generate the actual binary from the final BootSpace. BootSpace is a virtual class representing
|
||||
all objects that will be in the executable. Other than CompiledMethods, objects get transformed to data.
|
||||
|
||||
The Salama Convention
|
||||
----------------------
|
||||
But CompiledMethods, which are made up of Blocks, are compiled into a stream of bytes, which are the binary code for the
|
||||
function.
|
||||
|
||||
Since we're not in c, we use the regsters more suitably for our job:
|
||||
Virtual Objects
|
||||
----------------
|
||||
|
||||
- return register is _not_ the same as passing registers
|
||||
- we pin one more register (ala stack/fp) for type information (this is used for returns too)
|
||||
- one line (8 registers) can be used by a function (caller saved?)
|
||||
- rest are scratch and may not hold values during call
|
||||
There are four virtual objects that are accessible (we can access their variables):
|
||||
|
||||
For Arm this works out as:
|
||||
- 0 type word (for the line)
|
||||
- 1-6 argument passing + workspace
|
||||
- 7 return value
|
||||
- Self
|
||||
- Message (arguments, method name, self)
|
||||
- Frame (local and tmp variables)
|
||||
- NewMessage ( to build the next message sent)
|
||||
|
||||
This means syscalls (using 7 for call number and 0 for return) must shuffle a little, but there's space to do it.
|
||||
Some more detail:
|
||||
These are pretty much the first four registers. When the code goes from virtual to register, we use register instrucitons
|
||||
to replace virtual ones.
|
||||
|
||||
1 - returning in the same register as passing makes that one register a special case, which i want to avoid. shuffling it gets tricky and involves 2 moves for what?
|
||||
As i see it the benefitd of reusing the same register are one more argument register (not needed) and easy chaining of calls, which doen't really happen so much.
|
||||
On the plus side, not using the same register makes saving and restoring registers easy (to implement and understand!).
|
||||
An easy to understand policy is worth gold, as register mistakes are HARD to debug and not what i want to spend my time with just now. So that's settled.
|
||||
Eg: A Virtual::Set can move data around inside those objects. And since in Arm this can not be done in one instruciton,
|
||||
we use two, one to move to an unused register and then into the destination. And then we need some fiddling of bits
|
||||
to shift the type info.
|
||||
|
||||
2 - Tagging integers like MRI/BB is a hack which does not extend to other types, such as floats. So we don't use that and instead carry type information externally to the value. This is a burden off course, but then so is tagging.
|
||||
The convention (to make it easier) is to handle data in lines (8 words) and have one of them carry the type info for the other 7. This is also the object layout and so we reuse that code on the stack.
|
||||
Another simple example is a Call. A simple case of a Class function call resolves the class object, and with the
|
||||
method name the function to be called at compile-time. And so this results in a Register::Call, which is an Arm
|
||||
instruction.
|
||||
|
||||
A C call
|
||||
---------
|
||||
|
||||
Ok, there are no c calls. But syscalls are very similar. This is not at all as simple as the nice Class call described
|
||||
above.
|
||||
|
||||
For syscall in Arm (linux) you have to load registers 0-x (depending on call), load R7 with the syscall number and then
|
||||
issue the software interupt instruction. If you get back something back, it's in R0.
|
||||
|
||||
In short, lots of shuffling. And to make it fit with our four object architecture, we need the Message to hold the data
|
||||
for the call and Sys (module) to be self. And then the actual functions do the shuffle, saving the data and restoring it.
|
||||
And setting type information according to kernel documentation (as there is no runtime info)
|
||||
|
@ -6,10 +6,10 @@ Hence the need for a code/object file format (remember an oo program is just obj
|
||||
|
||||
I started with yaml, which is nice in that it has a solid implementation, reads and writes, handles arbitrary objects, handles graphs and is a sort of readable text format.
|
||||
|
||||
But the sort of started to get to me, because
|
||||
1) it's way to verbose (long files, object groups over many pages) and
|
||||
2) does not allow for (easy) ordering.
|
||||
Also it was placing references in weird (first seen) places.
|
||||
But the "sort of" started to get to me, because
|
||||
|
||||
- 1) it's way to verbose (long files, object groups over many pages) and
|
||||
- 2) does not allow for (easy) ordering.
|
||||
|
||||
To fix this i started on Sof, with an eye to expand it.
|
||||
|
||||
@ -19,7 +19,7 @@ The main starting goal was quite like yaml, but with
|
||||
- also short versions of arrays and hashes
|
||||
- Shorter class names (no ruby/object or even ruby/struct stuff)
|
||||
- references at the most shallow level
|
||||
- a possibility to order attributes and specify attributes that should not be serialized
|
||||
- an easy way to order attributes and specify attributes that should not be serialized
|
||||
|
||||
### Salama Object File
|
||||
|
||||
|
54
lib/virtual/README.md
Normal file
54
lib/virtual/README.md
Normal file
@ -0,0 +1,54 @@
|
||||
### Virtual OO Machine
|
||||
|
||||
This is really an OV (object value) not object oriented machine.
|
||||
|
||||
Integers and References are Values. We make them look like objects, sure, but they are not.
|
||||
Symbols have similar properties and those are:
|
||||
|
||||
- eqality means identity
|
||||
- no chhange over lifetime
|
||||
|
||||
It's like with Atoms: they used to be the smallest possible physical unit. Now we have electrons, proton and neutrons.
|
||||
And so objects are made up of Values (not objects), integers, floats , references and possibly more.
|
||||
|
||||
Values have type in the same way objects have a class. We keep track of the type at runtime.
|
||||
|
||||
### Layers
|
||||
|
||||
*Ast* instances get created by the salama-reader gem from source. Here we add compile functions to ast classes and
|
||||
comile the ast layer into Virtual:: objects
|
||||
|
||||
The main objects are BootSpace (lots of objects), BootClass (represents a class),
|
||||
CompiledMethod (with Blocks and Instruction).
|
||||
|
||||
*Virtual* Instructions get further transformed into *register* instructions. This is done by an abstractly defined
|
||||
Register Machine with basic Intructions. A concrete implementation (like Arm) derives and creates derived
|
||||
Instructions.
|
||||
|
||||
The final transformation assigns Positions to all boot objects (Linker) and assembles them into a binary representation.
|
||||
The data- part is then a representation of classes in the *parfait* runtime. And the instrucions make up the
|
||||
funtions.
|
||||
|
||||
### Accessible Objects
|
||||
|
||||
Object oriented systems have data hiding. So we have access to the inner state of only four objects:
|
||||
|
||||
- Self
|
||||
- Message (arguments, method name, self)
|
||||
- Frame (local and tmp variables)
|
||||
- NewMessage ( to build the next message sent)
|
||||
|
||||
A single instructions (Set) allows movement of data between these. There are compare, branch and call intructions too.
|
||||
|
||||
### Micro
|
||||
|
||||
The micro-kernel idea is well stated by: If you can leave it out, do.
|
||||
|
||||
|
||||
As such we are aiming for integer and reference (type) support, and a minimal class system
|
||||
(object/class/aray/hash/string).
|
||||
|
||||
*Parfait* is that part of the runtime that can be coded in ruby. It is parsed, like any other code and always included
|
||||
in the resulting binary. Builtin is the part of the runtime that can not be coded in ruby (but is still needed). This
|
||||
is coded y construction CompiledMethods in code and neccesarily machine dependant.
|
||||
|
Loading…
x
Reference in New Issue
Block a user