move md to haml

This commit is contained in:
Torsten Ruger
2018-04-10 19:50:07 +03:00
parent 4b927c4f29
commit b61bc7c7ad
121 changed files with 3301 additions and 8572 deletions

View File

@ -0,0 +1,30 @@
%p
Well, it has been a good holiday, two months in Indonesia, Bali and diving Komodo. It brought
clarity, and so i have to start a daunting task.
%p
When i learned programming at University, they were still teaching Pascal. So when I got to choose
c++ in my first bigger project that was a real step up. But even i wrestled templates, it was
Smalltalk that took my heart immediately when i read about it. And I read quite a bit, including the Blue Book about the implementation of it.
%p
The next distinct step up was Java, in 1996, and then ruby in 2001. Until i mostly stopped coding
in 2004 when i moved to the country side and started our
%a{:href => "http://villataika.fi/en/index.html"} B&B
But then we needed web-pages, and before long a pos for our shop, so i was back on the keyboard.
And since it was a thing i had been wanting to do, I wrote a database.
%p
Purple was my current idea of an ideal data-store. Save by reachability, automatic loading by
traversal and schema-free any ruby object saving. In memory, based on Judy, it did about 2000
transaction per second. Alas, it didnt have any searching.
%p
So i bit the bullet and implemented an sql interface to it. After a failed attempt with rails 2
and after 2 major rewrites i managed to integrate what by then was called warp into Arel (rails3).
But while raw throughput was still about the same, when it had to go through Arel it crawled to 50
transactions per second, about the same as sqlite.
%p
This was maybe 2011, and there was no doubt anymore. Not the database, but ruby itself was the
speed hog. I aborted.
%p
In 2013 I bought a Raspberry Pi and off course I wanted to use it with ruby. Alas… Slow pi + slow ruby = nischt gut.
I gave up.
%p So then the clarity came with the solution, build your own ruby. I started designing a bit on the beach already.
%p Still, daunting. But maybe just possible….

View File

@ -1,30 +0,0 @@
Well, it has been a good holiday, two months in Indonesia, Bali and diving Komodo. It brought
clarity, and so i have to start a daunting task.
When i learned programming at University, they were still teaching Pascal. So when I got to choose
c++ in my first bigger project that was a real step up. But even i wrestled templates, it was
Smalltalk that took my heart immediately when i read about it. And I read quite a bit, including the Blue Book about the implementation of it.
The next distinct step up was Java, in 1996, and then ruby in 2001. Until i mostly stopped coding
in 2004 when i moved to the country side and started our [B&B](http://villataika.fi/en/index.html)
But then we needed web-pages, and before long a pos for our shop, so i was back on the keyboard.
And since it was a thing i had been wanting to do, I wrote a database.
Purple was my current idea of an ideal data-store. Save by reachability, automatic loading by
traversal and schema-free any ruby object saving. In memory, based on Judy, it did about 2000
transaction per second. Alas, it didn't have any searching.
So i bit the bullet and implemented an sql interface to it. After a failed attempt with rails 2
and after 2 major rewrites i managed to integrate what by then was called warp into Arel (rails3).
But while raw throughput was still about the same, when it had to go through Arel it crawled to 50
transactions per second, about the same as sqlite.
This was maybe 2011, and there was no doubt anymore. Not the database, but ruby itself was the
speed hog. I aborted.
In 2013 I bought a Raspberry Pi and off course I wanted to use it with ruby. Alas... Slow pi + slow ruby = nischt gut.
I gave up.
So then the clarity came with the solution, build your own ruby. I started designing a bit on the beach already.
Still, daunting. But maybe just possible....

View File

@ -0,0 +1,29 @@
%h2#the-c-machine The c machine
%p Software engineers have clean brains, scrubbed into full c alignment through decades. A few rebels (klingons?) remain on embedded systems, but of those most strive towards posix compliancy too.
%p In other words, since all programming ultimately boils down to c, libc makes the bridge to the kernel/machine. All …. all but a small village in the northern (cold) parts of europe (Antskog) where …
%p So i had a look what we are talking about.
%h2#the-issue The issue
%p
Many, especially embedded guys, have noticed that your standard c library has become quite heavy
(2 Megs). Since it provides a defined api (posix) and large functionality on a plethora of systems (oss) and cpus. Even for different ABIs (application binary interfaces) and compilers/linkers it is no wonder.
%p ucLibc or dietLibc get the size down, especially diet quite a bit (130k). So thats ok then. Or is it?
%p Then i noticed that the real issue is not the size. Even my pi has 512 Mb, and of course even libc gets paged.
%p The real issue is the step into the C world. So, extern functions, call marshelling, and the question is for what.
%p Afer all the c library was created to make it easier for c programs to use the kernel. And i have no intention of coding any more c.
%h2#ruby-corestd-lib ruby core/std-lib
%p Off course the ruby-core and std libs were designed to do for ruby what libc does for c. Unfortunately they are badly designed and suffer from above brainwash (designed around c calls)
%p
Since salama is pure ruby there is a fair amount of functionality that would be nicer to provide straight in ruby. As gems off course, for everybody to see and fix.
For example, even if there were to be a printf (which i dislike) , it would be easy to code in ruby.
%p What is needed is the underlying write to stdout.
%h2#solution Solution
%p To get salama up and running, ie to have a “ruby” executable, there are really very few kernel calls needed. File open, read and stdout write, brk.
%p So the way this will go is to write syscalls where needed.
%p Having tried to reverse engineer uc, diet and musl, it seems best to go straight to the source.
%p
Most of that is off course for intel, but eax goes to r7 and after that the args are from r0 up, so not too bad. The definite guide for arm is here
%a{:href => "http://sourceforge.net/p/strace/code/ci/master/tree/linux/arm/syscallent.h"} http://sourceforge.net/p/strace/code/ci/master/tree/linux/arm/syscallent.h
But doesnt include arguments (only number of them), so
%a{:href => "http://syscalls.kernelgrok.com/"} http://syscalls.kernelgrok.com/
can be used.
%p So there, getting more metal by the minute. But the time from writing this to a hello world was 4 hours.

View File

@ -1,46 +0,0 @@
The c machine
-------------
Software engineers have clean brains, scrubbed into full c alignment through decades. A few rebels (klingons?) remain on embedded systems, but of those most strive towards posix compliancy too.
In other words, since all programming ultimately boils down to c, libc makes the bridge to the kernel/machine. All .... all but a small village in the northern (cold) parts of europe (Antskog) where ...
So i had a look what we are talking about.
The issue
----------
Many, especially embedded guys, have noticed that your standard c library has become quite heavy
(2 Megs). Since it provides a defined api (posix) and large functionality on a plethora of systems (os's) and cpu's. Even for different ABI's (application binary interfaces) and compilers/linkers it is no wonder.
ucLibc or dietLibc get the size down, especially diet quite a bit (130k). So that's ok then. Or is it?
Then i noticed that the real issue is not the size. Even my pi has 512 Mb, and of course even libc gets paged.
The real issue is the step into the C world. So, extern functions, call marshelling, and the question is for what.
Afer all the c library was created to make it easier for c programs to use the kernel. And i have no intention of coding any more c.
ruby core/std-lib
------------
Off course the ruby-core and std libs were designed to do for ruby what libc does for c. Unfortunately they are badly designed and suffer from above brainwash (designed around c calls)
Since salama is pure ruby there is a fair amount of functionality that would be nicer to provide straight in ruby. As gems off course, for everybody to see and fix.
For example, even if there were to be a printf (which i dislike) , it would be easy to code in ruby.
What is needed is the underlying write to stdout.
Solution
--------
To get salama up and running, ie to have a "ruby" executable, there are really very few kernel calls needed. File open, read and stdout write, brk.
So the way this will go is to write syscalls where needed.
Having tried to reverse engineer uc, diet and musl, it seems best to go straight to the source.
Most of that is off course for intel, but eax goes to r7 and after that the args are from r0 up, so not too bad. The definite guide for arm is here [http://sourceforge.net/p/strace/code/ci/master/tree/linux/arm/syscallent.h](http://sourceforge.net/p/strace/code/ci/master/tree/linux/arm/syscallent.h)
But doesn't include arguments (only number of them), so [http://syscalls.kernelgrok.com/](http://syscalls.kernelgrok.com/) can be used.
So there, getting more metal by the minute. But the time from writing this to a hello world was 4 hours.

View File

@ -0,0 +1,12 @@
%p Part of the reason why i even thought this was possible was because i had bumped into Metasm.
%p
Metasm creates native code in 100% ruby. Either from Assembler or even C (partially). And for many cpus too.
It also creates many binary formats, elf among them.
%p
Still, i wanted something small that i could understand easily as it was clear it would have to be changed to fit.
As there was no external assembler file format planned, the whole approach from parsing was inappropriate.
%p
I luckily found a small library, as, that did arm only and was just a few files. After removing not needed parts
like parsing and some reformatting i added an assembler like dsl.
%p This layer (arm subdirectory) said hello after about 2 weeks of work.
%p I also got qemu to work and can thus develop without the actual pi.

View File

@ -1,14 +0,0 @@
Part of the reason why i even thought this was possible was because i had bumped into Metasm.
Metasm creates native code in 100% ruby. Either from Assembler or even C (partially). And for many cpu's too.
It also creates many binary formats, elf among them.
Still, i wanted something small that i could understand easily as it was clear it would have to be changed to fit.
As there was no external assembler file format planned, the whole approach from parsing was inappropriate.
I luckily found a small library, as, that did arm only and was just a few files. After removing not needed parts
like parsing and some reformatting i added an assembler like dsl.
This layer (arm subdirectory) said hello after about 2 weeks of work.
I also got qemu to work and can thus develop without the actual pi.

View File

@ -0,0 +1,24 @@
%p Both “ends”, parsing and machine code, were relatively clear cut. Now it is into unknown territory.
%p I had ported the Kaleidoscope llvm tutorial language to ruby-llvm last year, so thee were some ideas floating.
%p
The idea of basic blocks, as the smallest unit of code without branches was pretty clear. Using those as jump
targets was also straight forward. But how to get from the AST to arm Intructions was not, and took some trying out.
%p
In the end, or rather now, it is the AST layer that “compiles” itself into the Vm layer. The Vm layer then assembles
itself into Instructions.
%p
General instructions are part of the Vm layer, but the code picks up derived classes and thus makes machine
dependent code possible. So far so ok.
%p
Register allocation was (and is) another story. Argument passing and local variables do work now, but there is definitely
room for improvement there.
%p
To get anything out of a running program i had to implement putstring (easy) and putint (difficult). Surprisingly
division is not easy and when pinned to 10 (divide by 10) quite strange. Still it works. While i was at writing
assembler i found a fibonachi in 10 or so instructions.
%p
To summarise, function definition and calling (including recursion) works.
If and and while structures work and also some operators and now its easy to add more.
%p
So we have a Fibonacchi in ruby using a while implementation that can be executed by salama and outputs the
correct result. After a total of 7 weeks this is much more than expected!

View File

@ -1,25 +0,0 @@
Both "ends", parsing and machine code, were relatively clear cut. Now it is into unknown territory.
I had ported the Kaleidoscope llvm tutorial language to ruby-llvm last year, so thee were some ideas floating.
The idea of basic blocks, as the smallest unit of code without branches was pretty clear. Using those as jump
targets was also straight forward. But how to get from the AST to arm Intructions was not, and took some trying out.
In the end, or rather now, it is the AST layer that "compiles" itself into the Vm layer. The Vm layer then assembles
itself into Instructions.
General instructions are part of the Vm layer, but the code picks up derived classes and thus makes machine
dependent code possible. So far so ok.
Register allocation was (and is) another story. Argument passing and local variables do work now, but there is definitely
room for improvement there.
To get anything out of a running program i had to implement putstring (easy) and putint (difficult). Surprisingly
division is not easy and when pinned to 10 (divide by 10) quite strange. Still it works. While i was at writing
assembler i found a fibonachi in 10 or so instructions.
To summarise, function definition and calling (including recursion) works.
If and and while structures work and also some operators and now it's easy to add more.
So we have a Fibonacchi in ruby using a while implementation that can be executed by salama and outputs the
correct result. After a total of 7 weeks this is much more than expected!

View File

@ -0,0 +1,29 @@
%p Parsing is a difficult, the theory incomprehensible and older tools cryptic. At least for me.
%p
And then i heard recursive is easy and used by even llvm. Formalised as peg parsing libraries exists, and in ruby
they have dsls and are suddenly quite understandable.
%p
Off the candidates i had first very positive experiences with treetop. Upon continuing i found the code
generation aspect not just clumsy (after all you can define methods in ruby), but also to interfere unneccessarily
with code control. On top of that conversion into an AST was not easy.
%p After looking around i found Parslet, which pretty much removes all those issues. Namely
%ul
%li It does not generate code, it generates methods. And has a nice dsl.
%li
It transforms to ruby basic types and has the notion on a transformation.
So an easy and clean way to create an AST
%li One can use ruby modules to partition a larger parser
%li Minimal dependencies (one file).
%li Active use and development.
%p
So i was sold, and i got up to speed quite quickly. But i also found out how fiddly such a parser is in regards
to ordering and whitespace.
%p
I spent some time to make quite a solid test framework, testing the different rules separately and also the
stages separately, so things would not break accidentally when growing.
%p
After about another 2 weeks i was able to parse functions, both calls and definitions, ifs and whiles and off course basic
types of integers and strings.
%p
With the great operator support it was a breeze to create all 15 ish binary operators. Even Array and Hash constant
definition was very quick. All in all surprisingly painless, thanks to Kasper!

View File

@ -1,29 +0,0 @@
Parsing is a difficult, the theory incomprehensible and older tools cryptic. At least for me.
And then i heard recursive is easy and used by even llvm. Formalised as peg parsing libraries exists, and in ruby
they have dsl's and are suddenly quite understandable.
Off the candidates i had first very positive experiences with treetop. Upon continuing i found the code
generation aspect not just clumsy (after all you can define methods in ruby), but also to interfere unneccessarily
with code control. On top of that conversion into an AST was not easy.
After looking around i found Parslet, which pretty much removes all those issues. Namely
- It does not generate code, it generates methods. And has a nice dsl.
- It transforms to ruby basic types and has the notion on a transformation.
So an easy and clean way to create an AST
- One can use ruby modules to partition a larger parser
- Minimal dependencies (one file).
- Active use and development.
So i was sold, and i got up to speed quite quickly. But i also found out how fiddly such a parser is in regards
to ordering and whitespace.
I spent some time to make quite a solid test framework, testing the different rules separately and also the
stages separately, so things would not break accidentally when growing.
After about another 2 weeks i was able to parse functions, both calls and definitions, ifs and whiles and off course basic
types of integers and strings.
With the great operator support it was a breeze to create all 15 ish binary operators. Even Array and Hash constant
definition was very quick. All in all surprisingly painless, thanks to Kasper!

View File

@ -0,0 +1,44 @@
%p Its such a nice name, crystal. My first association is clarity, and that is exactly what i am trying to achieve.
%p But ive been struggling a bit to achieve any clarity on the topic of system boundary: where does OO stop. I mean i cant very well define method lookup in ruby syntax, as that involves method lookups. But tail recursion is so booring, it just never stops!
%h4#kernel Kernel
%p In the design phase (yes there was one!), i had planned to use lambdas. A little naive maybe, as they are off course objects. Thus calling them means a method resolution.
%p So im settling for Module methods. I say settling because that off course always makes the module object available, though i dont see any use for it. A waste in space (one register) and time (loading it), but no better ideas are forthcoming.
%p The place for these methods, and ill go into it a little which in a second, is the Kernel. And finally the name makes sense too. That is its original (pre 1.9) place, as a module that Object includes, ie “below” even Object.
%p So Kernel is the place for methods that are needed to build the system, and may not be called on objects. Simple.
%p In other words, anything that can be coded on normal objects, should. But when that stops being possible, Kernel is the place.
%p And what are these functions? get_instance_variable or set too. Same for functions. Strangley these may in turn rely on functions that can be coded in ruby, but at the heart of the matter is an indexed operation ie object[2].
%p This functionality, ie getting the nth data in an object, is essential, but c makes such a good point of of it having no place in a public api. So it needs to be implemented in a “private” part and used in a save manner. More on the layers emerging below.
%p The Kernel is a module in salama that defines functions which return function objects. So the code is generated, instead of parsed. An essential distinction.
%h4#system System
%p
Its an important side note on that Kernel definition above, that it is
%em not
the same as system access function. These are in their own Module and may (or must) use the kernel to implement their functionality. But not the same.
%p Kernel is the VMs “core” if you want.
%p System is the access to the operating system functionality.
%h4#layers Layers
%p So from that Kernel idea have now emerged 3 Layers, 3 ways in which code is created.
%h5#machine Machine
%p The lowest layer is the Machine layer. This Layer generates Instructions or sequences thereof. So off course there is an Instruction class with derived classes, but also Block, the smallest, linear, sequences of Instructions.
%p Also there is an abstract RegisterMachine that is mostly a mediator to the current implementation (ArmMachine). The machine has functions that create Instructions
%p Some few machine functions return Blocks, or append their instructions to blocks. This is really more a macro layer. Usually they are small, but div10 for example is a real 10 instruction beauty.
%h5#kernel-1 Kernel
%p The Kernel functions return function objects. Kernel functions have the same name as the function they implement, so Kernel::putstring defines a function called putstring. Function objects (Vm::Function) carry entry/exit/body code, receiver/return/argument types and a little more.
%p The important thing is that these functions are callable from ruby code. Thus they form the glue from the next layer up, which is coded in ruby, to the machine layer. In a way the Kernel “exports” the machine functionality to salama.
%h5#parfait Parfait
%p Parfait is a thin layer implementing a mini-minimal OO system. Sure, all your usual suspects of string and integers are there, but they only implement what is really really necessary. For example strings mainly have new equals and put.
%p Parfait is heavy on Object/Class/Metaclass functionality, object instance and method lookup. All things needed to make an OO system OO. Not so much “real” functionality here, more creating the ability for that.
%p Stdlib would be the next layer up, implementing the whole of ruby functionality in terms of what Parfait provides.
%p The important thing here is that Parfait is written completely in ruby. Meaning it gets parsed by salama like any other code, and then transformed into executable form and written.
%p Any executable that salama generates will have Parfait in it. But only the final version of salama as a ruby vm, will have the whole stdlib and parser along.
%h4#salama Salama
%p
Salama uses the Kernel and Machine layers straight when creating code. Off course.
The closest equivalent to salama would be a compiler and so it is its job to create code (machine layer objects).
%p But it is my intention to keep that as small as possible. And the good news is its all ruby :-)
%h5#extensions Extensions
%p I just want to mention the idea of extensions that is a logical step for a minimal system. Off course they would be gems, but the interesting thing is they (like salama) could:
%ul
%li use salamas existing kernel/machine abstraction to define new functionality that is not possible in ruby
%li define new machine functionality, adding kernel type apis, to create wholly new, possibly hardware specific functionality
%p I am thinking graphic acceleration, GPU usage, vector apis, that kind of thing. In fact i aim to implement the whole floating point functionality as an extensions (as it clearly not essential for OO).

View File

@ -1,75 +0,0 @@
It's such a nice name, crystal. My first association is clarity, and that is exactly what i am trying to achieve.
But i've been struggling a bit to achieve any clarity on the topic of system boundary: where does OO stop. I mean i can't very well define method lookup in ruby syntax, as that involves method lookups. But tail recursion is so booring, it just never stops!
#### Kernel
In the design phase (yes there was one!), i had planned to use lambdas. A little naive maybe, as they are off course objects. Thus calling them means a method resolution.
So i'm settling for Module methods. I say settling because that off course always makes the module object available, though i don't see any use for it. A waste in space (one register) and time (loading it), but no better ideas are forthcoming.
The place for these methods, and i'll go into it a little which in a second, is the Kernel. And finally the name makes sense too. That is it's original (pre 1.9) place, as a module that Object includes, ie "below" even Object.
So Kernel is the place for methods that are needed to build the system, and may not be called on objects. Simple.
In other words, anything that can be coded on normal objects, should. But when that stops being possible, Kernel is the place.
And what are these functions? get_instance_variable or set too. Same for functions. Strangley these may in turn rely on functions that can be coded in ruby, but at the heart of the matter is an indexed operation ie object[2].
This functionality, ie getting the n'th data in an object, is essential, but c makes such a good point of of it having no place in a public api. So it needs to be implemented in a "private" part and used in a save manner. More on the layers emerging below.
The Kernel is a module in salama that defines functions which return function objects. So the code is generated, instead of parsed. An essential distinction.
#### System
It's an important side note on that Kernel definition above, that it is _not_ the same as system access function. These are in their own Module and may (or must) use the kernel to implement their functionality. But not the same.
Kernel is the VM's "core" if you want.
System is the access to the operating system functionality.
#### Layers
So from that Kernel idea have now emerged 3 Layers, 3 ways in which code is created.
##### Machine
The lowest layer is the Machine layer. This Layer generates Instructions or sequences thereof. So off course there is an Instruction class with derived classes, but also Block, the smallest, linear, sequences of Instructions.
Also there is an abstract RegisterMachine that is mostly a mediator to the current implementation (ArmMachine). The machine has functions that create Instructions
Some few machine functions return Blocks, or append their instructions to blocks. This is really more a macro layer. Usually they are small, but div10 for example is a real 10 instruction beauty.
##### Kernel
The Kernel functions return function objects. Kernel functions have the same name as the function they implement, so Kernel::putstring defines a function called putstring. Function objects (Vm::Function) carry entry/exit/body code, receiver/return/argument types and a little more.
The important thing is that these functions are callable from ruby code. Thus they form the glue from the next layer up, which is coded in ruby, to the machine layer. In a way the Kernel "exports" the machine functionality to salama.
##### Parfait
Parfait is a thin layer implementing a mini-minimal OO system. Sure, all your usual suspects of string and integers are there, but they only implement what is really really necessary. For example strings mainly have new equals and put.
Parfait is heavy on Object/Class/Metaclass functionality, object instance and method lookup. All things needed to make an OO system OO. Not so much "real" functionality here, more creating the ability for that.
Stdlib would be the next layer up, implementing the whole of ruby functionality in terms of what Parfait provides.
The important thing here is that Parfait is written completely in ruby. Meaning it get's parsed by salama like any other code, and then transformed into executable form and written.
Any executable that salama generates will have Parfait in it. But only the final version of salama as a ruby vm, will have the whole stdlib and parser along.
#### Salama
Salama uses the Kernel and Machine layers straight when creating code. Off course.
The closest equivalent to salama would be a compiler and so it is it's job to create code (machine layer objects).
But it is my intention to keep that as small as possible. And the good news is it's all ruby :-)
##### Extensions
I just want to mention the idea of extensions that is a logical step for a minimal system. Off course they would be gems, but the interesting thing is they (like salama) could:
- use salamas existing kernel/machine abstraction to define new functionality that is not possible in ruby
- define new machine functionality, adding kernel type api's, to create wholly new, possibly hardware specific functionality
I am thinking graphic acceleration, GPU usage, vector api's, that kind of thing. In fact i aim to implement the whole floating point functionality as an extensions (as it clearly not essential for OO).

View File

@ -0,0 +1,56 @@
%p
I was just reading my ruby book, wondering about functions and blocks and the like, as one does when implementing
a vm. Actually the topic i was struggling with was receivers, the pesty self, when i got the exception.
%p And while they say two steps forward, one step back, this goes the other way around.
%h3#one-step-back One step back
%p
As I just learnt assembler, it is the first time i am really considering how functions are implemented, and how the stack is
used in that. Sure i heard about it, but the details were vague.
%p
Off course a function must know where to return to. I mean the memory-address, as this cant very
well be fixed at compile time. In effect this must be passed to the function. But as programmers we
dont want to have to do that all the time and so it is passed implicitly.
%h5#the-missing-link The missing link
%p
The arm architecture makes this nicely explicit. There, a call is actually called branch with link.
This almost rubbed me for a while as it struck me as an exceedingly bad name. Until i “got it”,
that is. The link is the link back, well that was simple. But the thing is that the “link” is
put into the link register.
%p
This never struck me as meaningful, until now. Off course it means that “leaf” functions do not
need to touch it. Leaf functions are functions that do not call other functions, though they may
do syscalls as the kernel restores all registers. In other cpus the return address is pushed on
the stack, but in arm you have to do that yourself. Or not and save the instruction if youre so inclined.
%h5#the-hidden-argument The hidden argument
%p
But the point here is, that this makes it very explicit. The return address is in effect just
another argument. It usually gets passed automatically by compiler generated code, but never
the less. It is an argument.
%p
The “step back” is to make this argument explicit in the vm code. Thus making its handling,
ie passing or saving explicit too. And thus having less magic going on, because you cant
understand magic (you gotta believe it).
%h3#two-steps-forward Two steps forward
%p And so the thrust becomes clear i hope. We are talking about exceptions after all.
%p
Because to those who have not read the windows calling convention on exception handling or even
heard of the dwarf specification thereof, i say dont. It melts the brain.
You have to be so good at playing computer in your head, its not healthy.
%p
Instead, we make things simple and explicit. An exception is after all just a different way for
a function to return. So we need an address for it to return too.
%p
And as we have just made the normal return address an explicit argument, we just make the
exception return address and argument too. And presto.
%p
Even just the briefest of considerations of how we generate those exception return addresses
(landing pads? what a strange name), leads to the conclusion that if a function does not do
any exception handling, it just passes the same address on, that it got itself. Thus a
generated exception would jump clear over such a function.
%p
Since we have now got the exceptions to be normal code (alas with an exceptional name :-)) control
flow to and from it becomes quite normal too.
%p
To summarize each function has now a minimum of three arguments: the self, the return address and
the exception address.
%p We have indeed taken a step forward.

View File

@ -1,62 +0,0 @@
I was just reading my ruby book, wondering about functions and blocks and the like, as one does when implementing
a vm. Actually the topic i was struggling with was receivers, the pesty self, when i got the exception.
And while they say two steps forward, one step back, this goes the other way around.
### One step back
As I just learnt assembler, it is the first time i am really considering how functions are implemented, and how the stack is
used in that. Sure i heard about it, but the details were vague.
Off course a function must know where to return to. I mean the memory-address, as this can't very
well be fixed at compile time. In effect this must be passed to the function. But as programmers we
don't want to have to do that all the time and so it is passed implicitly.
##### The missing link
The arm architecture makes this nicely explicit. There, a call is actually called branch with link.
This almost rubbed me for a while as it struck me as an exceedingly bad name. Until i "got it",
that is. The link is the link back, well that was simple. But the thing is that the "link" is
put into the link register.
This never struck me as meaningful, until now. Off course it means that "leaf" functions do not
need to touch it. Leaf functions are functions that do not call other functions, though they may
do syscalls as the kernel restores all registers. In other cpu's the return address is pushed on
the stack, but in arm you have to do that yourself. Or not and save the instruction if you're so inclined.
##### The hidden argument
But the point here is, that this makes it very explicit. The return address is in effect just
another argument. It usually gets passed automatically by compiler generated code, but never
the less. It is an argument.
The "step back" is to make this argument explicit in the vm code. Thus making it's handling,
ie passing or saving explicit too. And thus having less magic going on, because you can't
understand magic (you gotta believe it).
### Two steps forward
And so the thrust becomes clear i hope. We are talking about exceptions after all.
Because to those who have not read the windows calling convention on exception handling or even
heard of the dwarf specification thereof, i say don't. It melts the brain.
You have to be so good at playing computer in your head, it's not healthy.
Instead, we make things simple and explicit. An exception is after all just a different way for
a function to return. So we need an address for it to return too.
And as we have just made the normal return address an explicit argument, we just make the
exception return address and argument too. And presto.
Even just the briefest of considerations of how we generate those exception return addresses
(landing pads? what a strange name), leads to the conclusion that if a function does not do
any exception handling, it just passes the same address on, that it got itself. Thus a
generated exception would jump clear over such a function.
Since we have now got the exceptions to be normal code (alas with an exceptional name :-)) control
flow to and from it becomes quite normal too.
To summarize each function has now a minimum of three arguments: the self, the return address and
the exception address.
We have indeed taken a step forward.

View File

@ -0,0 +1,44 @@
%p I am not stuck. I know im not. Just because there is little visible progress doesnt mean im stuck. It may just feel like it though.
%p But like little cogwheels in the clock, i can hear the background process ticking away and sometimes there is a gong.
%p What i wasnt stuck with, is where to draw the layer for the vm.
%h3#layers Layers
%p
Software engineers like layers. Like the onion boy. You can draw boxes, make presentation and convince your boss.
They help us to reason about the software.
%p
In this case the model was to go from ast layer to a vm layer. Via a compile method, that could just as well have been a
visitor.
%p
That didnt work, too big astep and so it was from ast, to vm, to neumann. But i couldnt decide
on the abstraction of the virtual machine layer. Specifically, when you have a send (and you have
soo many sends in ruby), do you:
%ul
%li model it as a vm instruction (a bit like java)
%li implement it in a couple instructions like resolve, a loop and call
%li go to a version that is clearly translatable to neumann, say without the value type implementation
%p
Obviously the third is where we need to get to, as the next step is the neumann layer and somewhow
we need to get there. In effect one could take those three and present them as layers, not
as alternatives like i have.
%h3#passes Passes
%p
And then the little cob went click, and the idea of passes resurfaced. LLvm has these passes on
the code tree, is probably where it surfaced from.
%p
So we can have as high of a degree of abstraction as possible when going from ast to code.
And then have as many passes over that as we want / need.
%p
Passes can be order dependent, and create more and more detail. To solve the above layer
conundrum, we just do a pass for each of those options.
%p The two main benefits that come from this are:
%p
1 - At each point, ie after and during each pass we can analyse the data. Imagine for example
that we would have picked the second layer option, that means there would never have been a
representation where the sends would have been explicit. Thus any analysis of them would be impossible or need reverse engineering (eg call graph analysis, or class caching)
%p
2 - Passes can be gems or come from other sources. The mechanism can be relatively oblivious to
specific passes. And they make the transformation explicit, ie easier to understand.
In the example of having picked the second layer level, one would have to patch the
implementation of that transformation to achieve a different result. With passes it would be
a matter of replacing a pass, thus explicitly stating “i want a non-standard send implementation”
%p Actually a third benefit is that it makes testing simpler. More modular. Just test the initial ast->code and then mostly the results of passes.

View File

@ -1,50 +0,0 @@
I am not stuck. I know i'm not. Just because there is little visible progress doesn't mean i'm stuck. It may just feel like it though.
But like little cogwheels in the clock, i can hear the background process ticking away and sometimes there is a gong.
What i wasn't stuck with, is where to draw the layer for the vm.
### Layers
Software engineers like layers. Like the onion boy. You can draw boxes, make presentation and convince your boss.
They help us to reason about the software.
In this case the model was to go from ast layer to a vm layer. Via a compile method, that could just as well have been a
visitor.
That didn't work, too big astep and so it was from ast, to vm, to neumann. But i couldn't decide
on the abstraction of the virtual machine layer. Specifically, when you have a send (and you have
soo many sends in ruby), do you:
- model it as a vm instruction (a bit like java)
- implement it in a couple instructions like resolve, a loop and call
- go to a version that is clearly translatable to neumann, say without the value type implementation
Obviously the third is where we need to get to, as the next step is the neumann layer and somewhow
we need to get there. In effect one could take those three and present them as layers, not
as alternatives like i have.
### Passes
And then the little cob went click, and the idea of passes resurfaced. LLvm has these passes on
the code tree, is probably where it surfaced from.
So we can have as high of a degree of abstraction as possible when going from ast to code.
And then have as many passes over that as we want / need.
Passes can be order dependent, and create more and more detail. To solve the above layer
conundrum, we just do a pass for each of those options.
The two main benefits that come from this are:
1 - At each point, ie after and during each pass we can analyse the data. Imagine for example
that we would have picked the second layer option, that means there would never have been a
representation where the sends would have been explicit. Thus any analysis of them would be impossible or need reverse engineering (eg call graph analysis, or class caching)
2 - Passes can be gems or come from other sources. The mechanism can be relatively oblivious to
specific passes. And they make the transformation explicit, ie easier to understand.
In the example of having picked the second layer level, one would have to patch the
implementation of that transformation to achieve a different result. With passes it would be
a matter of replacing a pass, thus explicitly stating "i want a non-standard send implementation"
Actually a third benefit is that it makes testing simpler. More modular. Just test the initial ast->code and then mostly the results of passes.

View File

@ -0,0 +1,77 @@
%p In a picture, or when taking a picture, the frame is very important. It sets whatever is in the picture into context.
%p
So it is a bit strange that having a
%strong frame
had the same sort of effect for me in programming.
I made the frame explicit, as an object, with functions and data, and immediately the whole
message sending became a whole lot clearer.
%p
You read about frames in calling conventions, or otherwise when talking about the machine stack.
It is the area a function uses for storing data, be it arguments, locals or temporary data.
Often a frame pointer will be used to establish a frames dynamic size and things like that.
But since its all so implicit and handled by code very few programmers ever see it was
all a bit muddled for me.
%p My frame has: return and exceptional return address, self, arguments, locals, temps
%p and methods to: create a frame, get a value to or from a slot or args/locals/tmps , return or raise
%h3#the-divide-compile-and-runtime The divide, compile and runtime
%p
I saw
%a{:href => "http://codon.com/compilers-for-free"} Toms video on free compilers
and read the underlying
book on
%a{:href => "http://www.itu.dk/people/sestoft/pebook/jonesgomardsestoft-a4.pdf"} Partial Evaluation
a bit, and it helped to make the distinctions clearer. As did the Layers and Passes post.
And the explicit Frame.
%p
The explicit frame established the vm explicitly too, or much better. All actions of the vm happen
in terms of the frame. Sending is creating a new one, loading it, finding the method and branching
there. Getting and setting variables is just indexing into the frame at the right index and so on.
Instance variables are a send to self, and on it goes.
%p
The great distinction is at the end quite simple, it is compile-time or run-time. And the passes
idea helps in that i start with most simple implementation against my vm. Then i have a data structure and can keep expanding it to “implement” more detail. Or i can analyse it to save
redundancies, ie optimize. But the point is in both cases i can just think about data structures
and what to do with them.
%p
And what i can do with my data (which is off course partially instruction sequences, but thats beside the point) really always depends on the great question: compile time vs run-time.
What is constant, can i do immediately. Otherwise leave for later. Simple.
%p
An example, attribute accessor: a simple send. I build a frame, set the self. Now a fully dynamic
implementation would leave it at that. But i can check if i know the type, if its not
reference (ie integer) we can raise immediately. Also the a reference tags the class for when
that is known at compile time. If so i can determine the layout at compile time and inline the
gets implementation. If not i could cache, but thats for later.
%p
As a further example on this, when one function has two calls on the same object, the layout
must only be retrieved once. ie in the sequences getType, determine method, call, the first
step can be omitted for the second call as a layout is constant.
%p
And as a final bonus of all this clarity, i immediately spotted the inconsistency in my own design: The frame i designed holds local variables, but the caller needs to create it. The caller can
not possibly know the number of local variables as that is decided by the invoked method,
which is only known at run-time. So we clearly need a two level thing here, one
that the caller creates, and one that the receiver creates.
%h3#messaging-and-slots Messaging and slots
%p It is interesting to relate what emerges to concepts learned over the years:
%p
There is this idea of message passing, as opposed to function calling. Everyone i know has learned
an imperative language as the first language and so message passing is a bit like vegetarian
food, all right for some. But off course there is a distinct difference in dynamic languages as
one does not know the actual method invoked beforehand. Also exceptions make the return trickier
and default values even the argument passing which then have to be augmented by the receiver.
%p
One main difficulty i had in with the message passing idea has always been what the message is.
But now i have the frame, i know exactly what it is: it is the frame, nothing more nothing less.
(Postscript: Later introduced the Message object which gets created by the caller, and the Frame
is what is created by the callee)
%p
Another interesting observation is the (hopefully) golden path this design goes between smalltalk
and self. In smalltalk (like ruby and…) all objects have a class. But some of the smalltalk researchers went on to do
= succeed "," do
%a{:href => "http://en.wikipedia.org/wiki/Self_(programming_language)"} Self
%p
Now in ruby, any object can have any variables anyway, but they incur a dynamic lookup. Types on
the other hand are like slots, and keeping each Type constant (while an object can change layouts)
makes it possible to have completely dynamic behaviour (smalltalk/ruby)
%strong and
use a slot-like (self) system with constant lookup speed. Admittedly the constancy only affects cache hits, but
as most systems are not dynamic most of the time, that is almost always.

View File

@ -1,73 +0,0 @@
In a picture, or when taking a picture, the frame is very important. It sets whatever is in the picture into context.
So it is a bit strange that having a **frame** had the same sort of effect for me in programming.
I made the frame explicit, as an object, with functions and data, and immediately the whole
message sending became a whole lot clearer.
You read about frames in calling conventions, or otherwise when talking about the machine stack.
It is the area a function uses for storing data, be it arguments, locals or temporary data.
Often a frame pointer will be used to establish a frames dynamic size and things like that.
But since it's all so implicit and handled by code very few programmers ever see it was
all a bit muddled for me.
My frame has: return and exceptional return address, self, arguments, locals, temps
and methods to: create a frame, get a value to or from a slot or args/locals/tmps , return or raise
### The divide, compile and runtime
I saw [Tom's video on free compilers](http://codon.com/compilers-for-free) and read the underlying
book on [Partial Evaluation](http://www.itu.dk/people/sestoft/pebook/jonesgomardsestoft-a4.pdf) a bit, and it helped to make the distinctions clearer. As did the Layers and Passes post.
And the explicit Frame.
The explicit frame established the vm explicitly too, or much better. All actions of the vm happen
in terms of the frame. Sending is creating a new one, loading it, finding the method and branching
there. Getting and setting variables is just indexing into the frame at the right index and so on.
Instance variables are a send to self, and on it goes.
The great distinction is at the end quite simple, it is compile-time or run-time. And the passes
idea helps in that i start with most simple implementation against my vm. Then i have a data structure and can keep expanding it to "implement" more detail. Or i can analyse it to save
redundancies, ie optimize. But the point is in both cases i can just think about data structures
and what to do with them.
And what i can do with my data (which is off course partially instruction sequences, but that's beside the point) really always depends on the great question: compile time vs run-time.
What is constant, can i do immediately. Otherwise leave for later. Simple.
An example, attribute accessor: a simple send. I build a frame, set the self. Now a fully dynamic
implementation would leave it at that. But i can check if i know the type, if it's not
reference (ie integer) we can raise immediately. Also the a reference tags the class for when
that is known at compile time. If so i can determine the layout at compile time and inline the
get's implementation. If not i could cache, but that's for later.
As a further example on this, when one function has two calls on the same object, the layout
must only be retrieved once. ie in the sequences getType, determine method, call, the first
step can be omitted for the second call as a layout is constant.
And as a final bonus of all this clarity, i immediately spotted the inconsistency in my own design: The frame i designed holds local variables, but the caller needs to create it. The caller can
not possibly know the number of local variables as that is decided by the invoked method,
which is only known at run-time. So we clearly need a two level thing here, one
that the caller creates, and one that the receiver creates.
### Messaging and slots
It is interesting to relate what emerges to concepts learned over the years:
There is this idea of message passing, as opposed to function calling. Everyone i know has learned
an imperative language as the first language and so message passing is a bit like vegetarian
food, all right for some. But off course there is a distinct difference in dynamic languages as
one does not know the actual method invoked beforehand. Also exceptions make the return trickier
and default values even the argument passing which then have to be augmented by the receiver.
One main difficulty i had in with the message passing idea has always been what the message is.
But now i have the frame, i know exactly what it is: it is the frame, nothing more nothing less.
(Postscript: Later introduced the Message object which gets created by the caller, and the Frame
is what is created by the callee)
Another interesting observation is the (hopefully) golden path this design goes between smalltalk
and self. In smalltalk (like ruby and...) all objects have a class. But some of the smalltalk researchers went on to do [Self](http://en.wikipedia.org/wiki/Self_(programming_language)), which
has no classes only objects. This was supposed to make things easier and faster. Slots were a bit like instance variables, but there were no classes to rule them.
Now in ruby, any object can have any variables anyway, but they incur a dynamic lookup. Types on
the other hand are like slots, and keeping each Type constant (while an object can change layouts)
makes it possible to have completely dynamic behaviour (smalltalk/ruby) **and** use a slot-like (self) system with constant lookup speed. Admittedly the constancy only affects cache hits, but
as most systems are not dynamic most of the time, that is almost always.

View File

@ -0,0 +1,49 @@
%p It has been a bit of a journey, but now we have arrived: Salama is officially named.
%h3#salama Salama
%p
Salama is a
= succeed "," do
%strong real word
%p
It is a word of my
%strong home-country
Finland, a finnish word (double plus)
%p
Salama means
%strong lightning
(or flash), and that is fast (double double plus) and bright.
%p
As some may have noticed in most places my nick is
= succeed "." do
%strong dancinglightning
%p
Also
%strong my wife
suggested it, so it always reminds me of her.
%h4#journey Journey
%p I started with crystal, which i liked. It speaks of clarity. It is related to ruby. All was good.
%p
But I was not the first to have this thought: The name is taken, as i found out by
chance. Ary Borenszweig started the
%a{:href => "http://crystal-lang.org/"} project
already two
years ago and they not only have a working system, but even compile themselves.
%p
Alas, Ary started out with the idea of ruby on rockets (ie fast), but when the
dynamic aspects came (as they have for me a month ago), he went for speed, to be
precise for a static system, not for ruby.
So his crystal is now its own language with ruby-ish style, but not semantics.
%p
That is why i had not found it. But when i did we talked, all was friendly, and we
agreed i would look for a new name.
%p
And so i did and many were taken. Kide (crystal in finish) was a step on the way,
as was ruby in ruby. And many candidates were explored and discarded, like broom
(basic ruby object oriented machine), or som (simple object machine), even ahimsa.
%h4#official Official
%p But then i found it, or rather we did, as it was a suggestion from my wife: Salama.
%p
After i found the name i made sure to claim it: I published first versions of gems
for salama and sub-modules. They dont work off course, but at least the name is
taken in rubygems too. Off course the github name is too.
%p So now i can get on with things at lightning speed :-)

View File

@ -1,44 +0,0 @@
It has been a bit of a journey, but now we have arrived: Salama is officially named.
### Salama
Salama is a **real word**, not made up or an acronym (plus).
It is a word of my **home-country** Finland, a finnish word (double plus)
Salama means **lightning** (or flash), and that is fast (double double plus) and bright.
As some may have noticed in most places my nick is **dancinglightning**. Nice :-)
Also **my wife** suggested it, so it always reminds me of her.
#### Journey
I started with crystal, which i liked. It speaks of clarity. It is related to ruby. All was good.
But I was not the first to have this thought: The name is taken, as i found out by
chance. Ary Borenszweig started the [project](http://crystal-lang.org/) already two
years ago and they not only have a working system, but even compile themselves.
Alas, Ary started out with the idea of ruby on rockets (ie fast), but when the
dynamic aspects came (as they have for me a month ago), he went for speed, to be
precise for a static system, not for ruby.
So his crystal is now it's own language with ruby-ish style, but not semantics.
That is why i had not found it. But when i did we talked, all was friendly, and we
agreed i would look for a new name.
And so i did and many were taken. Kide (crystal in finish) was a step on the way,
as was ruby in ruby. And many candidates were explored and discarded, like broom
(basic ruby object oriented machine), or som (simple object machine), even ahimsa.
#### Official
But then i found it, or rather we did, as it was a suggestion from my wife: Salama.
After i found the name i made sure to claim it: I published first versions of gems
for salama and sub-modules. They don't work off course, but at least the name is
taken in rubygems too. Off course the github name is too.
So now i can get on with things at lightning speed :-)

View File

@ -0,0 +1,81 @@
%p
While trying to figure out what i am coding i had to attack this storage format before i wanted to. The
immediate need is for code dumps, that are concise but readable. I started with yaml but that just takes
too many lines, so its too difficult to see what is going on.
%p
I just finished it, its a sort of condensed yaml i call sof (salama object file), but i want to take the
moment to reflect why i did this, what the bigger picture is, where sof may go.
%h3#program-lifecycle Program lifecycle
%p
Lets take a step back to mother smalltalk: there was the image. The image was/is the state of all the
objects in the system. Even threads, everything. Absolute object thinking taken to the ultimate.
A great idea off course, but doomed to ultimately fail because no man is an island (so no vm is either).
%h4#development Development
%p
Software development is a team sport, a social activity at its core. This is not always realised,
when the focus is too much on the outcome, but when you look at it, everything is done in teams.
%p
The other thing not really taken into account in the standard developemnt model is that it is a process in
time that really only gets jucy with a first customer released version. Then you get into branches for bugs
and features, versions with major and minor and before long your in a jungle of code.
%h4#code-centered Code centered
%p
But all that effort is concentrated on code. Ok nowadays schema evlolution is part of the game, so the
existance of data is acknowledged, but only as an external thing. Nowhere near that smalltalk model.
%p
But off course a truely object oriented program is not just code. Its data too. Maybe currently “just”
configuration and enums/constants and locales, but that is exactly my point.
%p
The lack of defined data/object storage is holding us back, making all our programs fruit-flies.
I mean it lives a short time and dies. A program has no way of “learning”, of accumulating data/knowledge
to use in a next invocation.
%h4#optimisation-example Optimisation example
%p
Lets take optimisation as an example. So a developer runs tests (rubyprof/valgrind or something)
with some output and makes program changes accordingly. But there are two obvious problems.
Firstly the data is collected in development not production. Secondly, and more importantly, a person is
needed.
%p
Of course a program could quite easily monitor itself, possibly over a long time, possibly only when
not at epak load. And surely some optimisations could be automated, a bit like the O1 .. On compiler
switches, more and more effort could be exerted on critical regions. Possibly all the way to
super-optimisation.
%p
But even if we did this, and a program would improve/jit itself, the fruits of this work are only usable
during that run of that program. Future invocations, just like future versions of that program do not
benefit. And thus start again, just like in Groundhog day.
%h3#storage Storage
%p
So to make that optimisation example work, we would need a storage: Theoretically we could make the program
change its own executable/object files, in ruby even its source. Theoretically, as we have no
representation of the code to work on.
%p
In salama we do have an internal representation, both at the code level (ast) and the compiled code
(CompiledMethod, Intructions and friends).
%h4#storage-format Storage Format
%p
Going back to the Image we can ask why was it doomed to fail: because of the binary,
proprietary implementation. Not because of the idea as such.
%p
Binary data needs either a rigourous specification and/or software to work on it. Work, what work?
We need to merge the data between installations, maintain versions and branches. That sounds a lot like
version control, because it basically is. Off course this “could” have been solved by the smalltalk
people, but wasnt. I think its fair to say that git was the first system to solve that problem.
%p
And git off course works with diff, and so for a 3-way merge to be successful we need a text format.
Which is why i started with yaml, and which is why also sof is text-based.
%p The other benefit is off course human readability.
%p
So now we have an object file * format in text, and we have git. What we do with it is up to us.
(* well, i only finished the writer. reading/parsing is “left as an excercise for the reader”:-)
%h4#sof-as-object-file-format Sof as object file format
%p
Ok, ill sketch it a little: Salama would use sof as its object file format, and only sof would ever be
stored in git. For developers to work, tools would create source and when that is edited compile it to sof.
%p
A program would be a repository of sof and resource files. Some convention for load order would be helpful
and some “area” where programs may collect data or changes to the program. Some may off course alter the
sofs directly.
%p
How, when and how automatically changes are merged (via git) is up to developer policy . But it is
easily imaginable that data in program designated areas get merged back into the “mainstream” automatically.

View File

@ -1,88 +0,0 @@
While trying to figure out what i am coding i had to attack this storage format before i wanted to. The
immediate need is for code dumps, that are concise but readable. I started with yaml but that just takes
too many lines, so it's too difficult to see what is going on.
I just finished it, it's a sort of condensed yaml i call sof (salama object file), but i want to take the
moment to reflect why i did this, what the bigger picture is, where sof may go.
### Program lifecycle
Let's take a step back to mother smalltalk: there was the image. The image was/is the state of all the
objects in the system. Even threads, everything. Absolute object thinking taken to the ultimate.
A great idea off course, but doomed to ultimately fail because no man is an island (so no vm is either).
#### Development
Software development is a team sport, a social activity at it's core. This is not always realised,
when the focus is too much on the outcome, but when you look at it, everything is done in teams.
The other thing not really taken into account in the standard developemnt model is that it is a process in
time that really only gets jucy with a first customer released version. Then you get into branches for bugs
and features, versions with major and minor and before long you'r in a jungle of code.
#### Code centered
But all that effort is concentrated on code. Ok nowadays schema evlolution is part of the game, so the
existance of data is acknowledged, but only as an external thing. Nowhere near that smalltalk model.
But off course a truely object oriented program is not just code. It's data too. Maybe currently "just"
configuration and enums/constants and locales, but that is exactly my point.
The lack of defined data/object storage is holding us back, making all our programs fruit-flies.
I mean it lives a short time and dies. A program has no way of "learning", of accumulating data/knowledge
to use in a next invocation.
#### Optimisation example
Let's take optimisation as an example. So a developer runs tests (rubyprof/valgrind or something)
with some output and makes program changes accordingly. But there are two obvious problems.
Firstly the data is collected in development not production. Secondly, and more importantly, a person is
needed.
Of course a program could quite easily monitor itself, possibly over a long time, possibly only when
not at epak load. And surely some optimisations could be automated, a bit like the O1 .. On compiler
switches, more and more effort could be exerted on critical regions. Possibly all the way to
super-optimisation.
But even if we did this, and a program would improve/jit itself, the fruits of this work are only usable
during that run of that program. Future invocations, just like future versions of that program do not
benefit. And thus start again, just like in Groundhog day.
### Storage
So to make that optimisation example work, we would need a storage: Theoretically we could make the program
change it's own executable/object files, in ruby even it's source. Theoretically, as we have no
representation of the code to work on.
In salama we do have an internal representation, both at the code level (ast) and the compiled code
(CompiledMethod, Intructions and friends).
#### Storage Format
Going back to the Image we can ask why was it doomed to fail: because of the binary,
proprietary implementation. Not because of the idea as such.
Binary data needs either a rigourous specification and/or software to work on it. Work, what work?
We need to merge the data between installations, maintain versions and branches. That sounds a lot like
version control, because it basically is. Off course this "could" have been solved by the smalltalk
people, but wasn't. I think it's fair to say that git was the first system to solve that problem.
And git off course works with diff, and so for a 3-way merge to be successful we need a text format.
Which is why i started with yaml, and which is why also sof is text-based.
The other benefit is off course human readability.
So now we have an object file * format in text, and we have git. What we do with it is up to us.
(* well, i only finished the writer. reading/parsing is "left as an excercise for the reader":-)
#### Sof as object file format
Ok, i'll sketch it a little: Salama would use sof as it's object file format, and only sof would ever be
stored in git. For developers to work, tools would create source and when that is edited compile it to sof.
A program would be a repository of sof and resource files. Some convention for load order would be helpful
and some "area" where programs may collect data or changes to the program. Some may off course alter the
sof's directly.
How, when and how automatically changes are merged (via git) is up to developer policy . But it is
easily imaginable that data in program designated areas get merged back into the "mainstream" automatically.

View File

@ -0,0 +1,71 @@
%p The time of introspection is coming to an end and i am finally producing executables again. (hurrah)
%h3#block-and-exception Block and exception
%p
Even neither ruby blocks or exceptions are implemented i have figured out how to do it, which is sort of good news.
Ill see off course when the day comes, but a plan is made and it is this:
%p No information lives on the machine stack.
%p
Maybe its easier to understand this way: All objects live in memory primarily. Whatever gets moved onto the machine
stack is just a copy and, for purposes of the gc, does not need to be considered.
%h3#objects-4-registers 4 Objects, 4 registers
%p As far as i have determined the vm needs internal access to exactly four objects. These are:
%ul
%li Message: the currently received one, ie the one that in a method led to the method being called
%li Self: this is an instance variable of the message
%li Frame: local and temporary variables of the method. Also part of the message.
%li NewMessage: where the next call is prepared
%p And, as stated above, all these objects live in memory.
%h3#single-set-instruction Single Set Instruction
%p
Self and frame are duplicated information, because then it is easier to transfer. After inital trying, i settle on a
single Instruction to move data around in the vm, Set. It can move instance variables from any of the objects to any
other of the 4 objects.
%p
The implementation of Set ensures that any move to the self slot in Message gets duplicated into the Self register. Same
for the frame, but both are once per method occurances, and both are read only afterwards, so dont need updating later.
%p Set, like other instructions may use any other variables at any time. Those registers (r4 and up) are scratch.
%h3#simple-call Simple call
%p
This makes calling relatively simple and thus easy to understand. To make a call we must be in a method, ie Message,
Self and Frame have been set up.
%p
The method then produces values for the call. This involves operations and the result of that is stored in a variable
(tmp/local/arg). When all values have been calculated a NewMessage is created and all data moved there (see Set)
%p
A Call is then quite simple: because of the duplication of Self and Frame, we only need to push the Message to the
machine stack. Then we move the NewMessage to Message, unroll (copy) the Self into its register and assign a new
Frame.
%p
Returning is also not overly complicated: Remembering that the return value is an instance variable in the
Message object. So when the method is done, the value is there, not for example in a dedicated register.
So we need to undo the above: move the current Message to NewMessage, pop the previously pushed message from the
machine stack and unroll the Self and Frame copies.
%p
The caller then continues and can pick up the return from its NewMessage if it is used for further calculation.
Its like it did everything to built the (New)Message and immediately the return value was filled in.
%p
As I said, often we need to calculate the values for the call, so we need to make calls. This happens in exacly the same
way, and the result is shuffled to a Frame slot (local or temporary variable).
%h3#message-creation Message creation
%p
Well, i hear, that sounds good and almost too easy. But …. (always one isnt there) what about the Message and Frame
objects, where do you get those from ?
%p
And this is true: in c the Message does not exist, its just data in registers and the Frame is created on the stack if
needed.
%p And unfortunately we cant really make a call to get/create these objects as that would create an endless loop. Hmm
%p We need a very fast way to create and reuse these objects: a bit like a stack. So lets just use a Stack :-)
%p
Off course not the machine stack, but a Stack object. An array to which we append and take from.
It must be global off course, or rather accessible from compiling code. And fast may be that we use assembler, or
if things work out well, we can use the same code as what makes builtin arrays tick.
%p
Still, this is a different problem and the full solution will need a bit time. But clearly it is solvable and does
not impact above register usage convention.
%h3#the-fineprint The fineprint
%p
Just for the sake of completeness: The assumtion i made a the beginning of the Simple Call section, can off course not
possibly be always true.
%p
To boot the vm, we must create the first message by “magic” and place it and the Self (Kernel module reference).
As it can be an empty Message for now, this is not difficult, just one of those little gotachs.

View File

@ -1,83 +0,0 @@
The time of introspection is coming to an end and i am finally producing executables again. (hurrah)
### Block and exception
Even neither ruby blocks or exceptions are implemented i have figured out how to do it, which is sort of good news.
I'll see off course when the day comes, but a plan is made and it is this:
No information lives on the machine stack.
Maybe it's easier to understand this way: All objects live in memory primarily. Whatever get's moved onto the machine
stack is just a copy and, for purposes of the gc, does not need to be considered.
### 4 Objects, 4 registers
As far as i have determined the vm needs internal access to exactly four objects. These are:
- Message: the currently received one, ie the one that in a method led to the method being called
- Self: this is an instance variable of the message
- Frame: local and temporary variables of the method. Also part of the message.
- NewMessage: where the next call is prepared
And, as stated above, all these objects live in memory.
### Single Set Instruction
Self and frame are duplicated information, because then it is easier to transfer. After inital trying, i settle on a
single Instruction to move data around in the vm, Set. It can move instance variables from any of the objects to any
other of the 4 objects.
The implementation of Set ensures that any move to the self slot in Message gets duplicated into the Self register. Same
for the frame, but both are once per method occurances, and both are read only afterwards, so don't need updating later.
Set, like other instructions may use any other variables at any time. Those registers (r4 and up) are scratch.
### Simple call
This makes calling relatively simple and thus easy to understand. To make a call we must be in a method, ie Message,
Self and Frame have been set up.
The method then produces values for the call. This involves operations and the result of that is stored in a variable
(tmp/local/arg). When all values have been calculated a NewMessage is created and all data moved there (see Set)
A Call is then quite simple: because of the duplication of Self and Frame, we only need to push the Message to the
machine stack. Then we move the NewMessage to Message, unroll (copy) the Self into it's register and assign a new
Frame.
Returning is also not overly complicated: Remembering that the return value is an instance variable in the
Message object. So when the method is done, the value is there, not for example in a dedicated register.
So we need to undo the above: move the current Message to NewMessage, pop the previously pushed message from the
machine stack and unroll the Self and Frame copies.
The caller then continues and can pick up the return from it's NewMessage if it is used for further calculation.
It's like it did everything to built the (New)Message and immediately the return value was filled in.
As I said, often we need to calculate the values for the call, so we need to make calls. This happens in exacly the same
way, and the result is shuffled to a Frame slot (local or temporary variable).
### Message creation
Well, i hear, that sounds good and almost too easy. But .... (always one isn't there) what about the Message and Frame
objects, where do you get those from ?
And this is true: in c the Message does not exist, it's just data in registers and the Frame is created on the stack if
needed.
And unfortunately we can't really make a call to get/create these objects as that would create an endless loop. Hmm
We need a very fast way to create and reuse these objects: a bit like a stack. So let's just use a Stack :-)
Off course not the machine stack, but a Stack object. An array to which we append and take from.
It must be global off course, or rather accessible from compiling code. And fast may be that we use assembler, or
if things work out well, we can use the same code as what makes builtin arrays tick.
Still, this is a different problem and the full solution will need a bit time. But clearly it is solvable and does
not impact above register usage convention.
### The fineprint
Just for the sake of completeness: The assumtion i made a the beginning of the Simple Call section, can off course not
possibly be always true.
To boot the vm, we must create the first message by "magic" and place it and the Self (Kernel module reference).
As it can be an empty Message for now, this is not difficult, just one of those little gotachs.

View File

@ -0,0 +1,100 @@
%p The register machine abstraction has been somewhat thin, and it is time to change that
%h3#current-affairs Current affairs
%p
When i started, i started from the assembler side, getting arm binaries working and off course learning the arm cpu
instruction set in assembler memnonics.
%p
Not having
%strong any
experience at this level i felt that arm was pretty sensible. Much better than i expected. And
so i abtracted the basic instruction classes a little and had the arm instructions implement them pretty much one
to one.
%p
Then i tried to implement any ruby logic in that abstraction and failed. Thus was born the virtual machine
abstraction of having Message, Frame and Self objects. This in turn mapped nicely to registers with indexed
addressing.
%h3#addressing Addressing
%p
I just have to sidestep here a little about addressing: the basic problem is off course that we have no idea at
compile-time at what address the executable will end up.
%p
The problem first emerged with calling functions. Mostly because that was the only objects i had, and so i was
very happy to find out about pc relative addressing, in which you jump or call relative to your current position
(
%strong> p
rogram
= succeed "ounter)." do
%strong c
%p
Then came the first strings and the aproach can be extended: instead of grabbing some memory location, ie loading
and address and dereferencing, we calculate the address in relation to pc and then dereference. This is great and
works fine.
%p
But the smug smile is wiped off the face when one tries to store references. This came with the whole object
aproach, the bootspace holding references to
%strong all
objects in the system. I even devised a plan to always store
relative addresses. Not relative to pc, but relative to the self that is storing. This im sure would have
worked fine too, but it does mean that the running program also has to store those relative addresses (or have
different address types, shudder). That was a runtime burden i was not willing to accept.
%p
So there are two choices as far as i see: use elf relocation, or relocate in init code. And yet again i find myself
biased to the home-growm aproach. Off course i see that this is partly because i dont want to learn the innards of
elf as something very complicated that does a simple thing. But also because it is so simple i am hoping it isnt
such a big deal. Most of the code for it, object iteration, type testing, layout decoding, will be useful and
neccessary later anyway.
%h3#concise-instruction-set Concise instruction set
%p
So that addressing aside was meant to further the point of a need for a good register instruction set (to write the
relocation in). And the code that i have been writing to implement the vm instructions clearly shows a need for
a better model at the register model.
%p
On the other hand, the idea of Passes will make it very easy to have a completely sepeate register machine layer.
We just transfor the vm to that, and then later from that to arm (or later intel). So there are three things that i
am looking for with the new register machine instruction set:
%ul
%li easy to understand the model (ie register machine, pc, ..), free of real machine quirks
%li small set of instructions that is needed for our vm
%li better names for instructions
%p
Especially the last one: all the mvn and ldr is getting to me. Its so 50s, as if we didnt have the space to spell
out move or load. And even those are not good names, at least i am always wondering what is a move and what a load.
And as i explained above in the addressing, if i wanted to load an address of an object into a register with relative
addressing, i would actually have to do an add. But when reading an add instruction it is not an intuative
conclusion that a load is meant. And since this is a fresh effort i would rather change these things now and make
it easier for others to learn sensible stuff than me get used to cryptics only to have everyone after me do the same.
%p
So i will have instructions like RegisterMove, ConstantLoad, Branch, which will translate to mov, ldr and b in arm. I still like to keep the arm level with the traditional names, so people who actually know arm feel right at home.
But the extra register layer will make it easier for everyone who has not programmed assembler (and me!),
which i am guessing is quite a lot in the
%em ruby
community.
%p
In implementation terms it is a relatively small step from the vm layer to the register layer. And an even smaller
one to the arm layer. But small steps are good, easy to take, easy to understand, no stumbling.
%h3#extra-benefits Extra Benefits
%p
As i am doing this for my own sanity, any additional benefits are really extra, for free as it were. And those extra
benefits clearly exist.
%h5#clean-interface-for-cpu-specific-implementation Clean interface for cpu specific implementation
%p
That really says it all. That interface was a bit messy, as the RegisterMachine was used in Vm code, but was actually
an Arm implementation. So no seperation. Also as mentioned the instruction set was arm heavy, with the quirks
even arm has.
%p
So in the future any specific cpu implementation can be quite self sufficient. The classes it uses dont need to
derive from anything specific and need only implement the very small code interface (position/length/assemble).
And to hook in, all that is needed is to provide a translation from RegisterMachine instructions, which can be
done very nicely by providing a Pass for every instruction. So that layer of code is quite seperate from the actual
assembler, so it should be easy to reuse existing code (like wilson or metasm).
%h5#reusable-optimisations Reusable optimisations
%p
Clearly the better seperation allows for better optimisations. Concretely Passes can be written to optimize the
RegiterMachines workings. For example register use, constant extraction from loops, or folding of double
moves (when a value is moved from reg1 to reg2, and then from reg2 to reg3, and reg2 never being used).
%p
Such optimisations are very general and should then be reusable for specific cpu implementations. They are still
usefull at RegiterMachine level mind, as the code is “cleaner” there and it is easier to detect fluff. But the same
code may be run after a cpu translation, removing any “fluff” the translation introduced. Thus the translation
process may be kept simpler too, as that doesnt need to check for possible optimisations at the same time
as translating. Everyone wins :-)

View File

@ -1,96 +0,0 @@
The register machine abstraction has been somewhat thin, and it is time to change that
### Current affairs
When i started, i started from the assembler side, getting arm binaries working and off course learning the arm cpu
instruction set in assembler memnonics.
Not having **any** experience at this level i felt that arm was pretty sensible. Much better than i expected. And
so i abtracted the basic instruction classes a little and had the arm instructions implement them pretty much one
to one.
Then i tried to implement any ruby logic in that abstraction and failed. Thus was born the virtual machine
abstraction of having Message, Frame and Self objects. This in turn mapped nicely to registers with indexed
addressing.
### Addressing
I just have to sidestep here a little about addressing: the basic problem is off course that we have no idea at
compile-time at what address the executable will end up.
The problem first emerged with calling functions. Mostly because that was the only objects i had, and so i was
very happy to find out about pc relative addressing, in which you jump or call relative to your current position
(**p**rogram **c**ounter). Since the relation is not changed by relocation all is well.
Then came the first strings and the aproach can be extended: instead of grabbing some memory location, ie loading
and address and dereferencing, we calculate the address in relation to pc and then dereference. This is great and
works fine.
But the smug smile is wiped off the face when one tries to store references. This came with the whole object
aproach, the bootspace holding references to **all** objects in the system. I even devised a plan to always store
relative addresses. Not relative to pc, but relative to the self that is storing. This i'm sure would have
worked fine too, but it does mean that the running program also has to store those relative addresses (or have
different address types, shudder). That was a runtime burden i was not willing to accept.
So there are two choices as far as i see: use elf relocation, or relocate in init code. And yet again i find myself
biased to the home-growm aproach. Off course i see that this is partly because i don't want to learn the innards of
elf as something very complicated that does a simple thing. But also because it is so simple i am hoping it isn't
such a big deal. Most of the code for it, object iteration, type testing, layout decoding, will be useful and
neccessary later anyway.
### Concise instruction set
So that addressing aside was meant to further the point of a need for a good register instruction set (to write the
relocation in). And the code that i have been writing to implement the vm instructions clearly shows a need for
a better model at the register model.
On the other hand, the idea of Passes will make it very easy to have a completely sepeate register machine layer.
We just transfor the vm to that, and then later from that to arm (or later intel). So there are three things that i
am looking for with the new register machine instruction set:
- easy to understand the model (ie register machine, pc, ..), free of real machine quirks
- small set of instructions that is needed for our vm
- better names for instructions
Especially the last one: all the mvn and ldr is getting to me. It's so 50's, as if we didn't have the space to spell
out move or load. And even those are not good names, at least i am always wondering what is a move and what a load.
And as i explained above in the addressing, if i wanted to load an address of an object into a register with relative
addressing, i would actually have to do an add. But when reading an add instruction it is not an intuative
conclusion that a load is meant. And since this is a fresh effort i would rather change these things now and make
it easier for others to learn sensible stuff than me get used to cryptics only to have everyone after me do the same.
So i will have instructions like RegisterMove, ConstantLoad, Branch, which will translate to mov, ldr and b in arm. I still like to keep the arm level with the traditional names, so people who actually know arm feel right at home.
But the extra register layer will make it easier for everyone who has not programmed assembler (and me!),
which i am guessing is quite a lot in the *ruby* community.
In implementation terms it is a relatively small step from the vm layer to the register layer. And an even smaller
one to the arm layer. But small steps are good, easy to take, easy to understand, no stumbling.
### Extra Benefits
As i am doing this for my own sanity, any additional benefits are really extra, for free as it were. And those extra
benefits clearly exist.
##### Clean interface for cpu specific implementation
That really says it all. That interface was a bit messy, as the RegisterMachine was used in Vm code, but was actually
an Arm implementation. So no seperation. Also as mentioned the instruction set was arm heavy, with the quirks
even arm has.
So in the future any specific cpu implementation can be quite self sufficient. The classes it uses don't need to
derive from anything specific and need only implement the very small code interface (position/length/assemble).
And to hook in, all that is needed is to provide a translation from RegisterMachine instructions, which can be
done very nicely by providing a Pass for every instruction. So that layer of code is quite seperate from the actual
assembler, so it should be easy to reuse existing code (like wilson or metasm).
##### Reusable optimisations
Clearly the better seperation allows for better optimisations. Concretely Passes can be written to optimize the
RegiterMachine's workings. For example register use, constant extraction from loops, or folding of double
moves (when a value is moved from reg1 to reg2, and then from reg2 to reg3, and reg2 never being used).
Such optimisations are very general and should then be reusable for specific cpu implementations. They are still
usefull at RegiterMachine level mind, as the code is "cleaner" there and it is easier to detect fluff. But the same
code may be run after a cpu translation, removing any "fluff" the translation introduced. Thus the translation
process may be kept simpler too, as that doesn't need to check for possible optimisations at the same time
as translating. Everyone wins :-)

View File

@ -0,0 +1,28 @@
%p As before the original start of the project, i was 6 weeks on holiday. The distance and lack of computer really helps.
%h3#review Review
%p So i printed most of the code and the book and went over it. And apart from abismal spelling i found especially one mistake.
%p I had been going at the thing from the angle of producing binaries. Wrong aproach.
%h4#ruby-is-dynamic Ruby is Dynamic
%p In fact ruby is so dynamic it is hard to think of anything that you need to do at compile time that you cant do at runtime.
%p
In other words,
%em all
functionality is available at run-time. Ie it needs to be available in ruby, and since it then is available in ruby, one should reuse it. I had just sort of tried to avoid this, as it seemed so big.
%p In fact it is quite easy to express what needs to happed for eg. a method call, in ruby. The hard thing is to use that code at compile time.
%h4#inlining Inlining
%p When i say hard, i mean hard to code. Actually it is quite easy to understand. One “just” needs to inline the code, easy actually. Off course i had known that inlining would be neccessary in the end, i had just thought later would be fine. Well, it isnt. Off course, is it ever!
%p Inlining is making the functionality happen, without initializing a method call and return. Off course this is only possible for known function calls, but thats enough. The objects/classes we use during method dispatch are well known, so everything can be resolved at compile time. Hunky dory. Just how?
%p As a first step we change the self, while saving the old self to a tmp. Then we have to deal with how the called function accesses variables (arguments or locals). We know it does this through the Message and Frame objects. But since those are different for an inlined function, we have to make them explicit arguments. So instead of the normal eg. Message, we can create an InlineMessage for inlined function. When resolving a variable name, this InlinedMessage will look up in the parents variables and arrange access to that.
%h4#changes Changes
%p So some of the concrete changes that will come once ive done all cosmetic fixes:
%ul
%li much more parfait classes / functionality
%li remove all duplication in vm (that is now parfait)
%li change of compile, using explicit message/frames
%li explicit logic type (alongside integer + reference)
%p I also decided it would be cleaner to use the visitor pattern for compiling the ast to vm. In fact the directory should be named compile.
%p And i noticed that what i have called Builtin up to now is actually part of the Register machine layer (not vm), so it needs to move there.
%h3#some-publicity Some publicity
%p
I have now given lightning talk on Frozen Rails 2014 and Ruby Bath 2015.
As 5 Minutes is clearly now enough i will work on a longer presentation.

View File

@ -1,41 +0,0 @@
As before the original start of the project, i was 6 weeks on holiday. The distance and lack of computer really helps.
### Review
So i printed most of the code and the book and went over it. And apart from abismal spelling i found especially one mistake.
I had been going at the thing from the angle of producing binaries. Wrong aproach.
#### Ruby is Dynamic
In fact ruby is so dynamic it is hard to think of anything that you need to do at compile time that you can't do at runtime.
In other words, *all* functionality is available at run-time. Ie it needs to be available in ruby, and since it then is available in ruby, one should reuse it. I had just sort of tried to avoid this, as it seemed so big.
In fact it is quite easy to express what needs to happed for eg. a method call, in ruby. The hard thing is to use that code at compile time.
#### Inlining
When i say hard, i mean hard to code. Actually it is quite easy to understand. One "just" needs to inline the code, easy actually. Off course i had known that inlining would be neccessary in the end, i had just thought later would be fine. Well, it isn't. Off course, is it ever!
Inlining is making the functionality happen, without initializing a method call and return. Off course this is only possible for known function calls, but that's enough. The objects/classes we use during method dispatch are well known, so everything can be resolved at compile time. Hunky dory. Just how?
As a first step we change the self, while saving the old self to a tmp. Then we have to deal with how the called function accesses variables (arguments or locals). We know it does this through the Message and Frame objects. But since those are different for an inlined function, we have to make them explicit arguments. So instead of the normal eg. Message, we can create an InlineMessage for inlined function. When resolving a variable name, this InlinedMessage will look up in the parents variables and arrange access to that.
#### Changes
So some of the concrete changes that will come once i've done all cosmetic fixes:
- much more parfait classes / functionality
- remove all duplication in vm (that is now parfait)
- change of compile, using explicit message/frames
- explicit logic type (alongside integer + reference)
I also decided it would be cleaner to use the visitor pattern for compiling the ast to vm. In fact the directory should be named compile.
And i noticed that what i have called Builtin up to now is actually part of the Register machine layer (not vm), so it needs to move there.
### Some publicity
I have now given lightning talk on Frozen Rails 2014 and Ruby Bath 2015.
As 5 Minutes is clearly now enough i will work on a longer presentation.

View File

@ -0,0 +1,71 @@
%p
Since i got the ideas of Slots and the associated instruction Set, i have been wondering how that
fits in with the code generation.
%p
I moved the patched AST compiler methods to a Compiler, ok. But still what do all those compile
methods return.
%h2#expression Expression
%p
In ruby, everything is an expression. To recap “Expressions have a value, while statements do not”,
or statements represent actions while expressions represent values.
%p
So in ruby everything represents a value, also statements, or functions. There is no such thing
as the return void in C. Even loops and ifs result in a value, for a loop the last computed value
and for an if the value of the branch taken.
%p
Having had a vague grasp of this concept i tried to sort of haphazardly return the kind of value
that i though appropriate. Sometimes literals, sometimes slots. Sometimes “Return” , a slot
representing the return value of a function.
%h2#return-slot Return slot
%p Today i realized that the Slot representing the return value is special.
%p It does not hold the value that is returned, but rather the other way around.
%p A function returns what is in the Return slot, at the time of return.
%p
From there it is easy to see that it must be the Return that holds the last computed value.
A function can return at any time after all.
%p
The last computed value is the Expression that is currently evaluated. So the compile, which
initiates the evaluation, returns the Return slot. Always. Easy, simple, nice!
%h2#example Example
%p Constants: say the expression
%pre
%code
:preserve
true
%p would compile to a
%pre
%code
:preserve
ConstantLoad(ReturnSlot , TrueConstant)
%p While
%pre
%code
:preserve
2 + 4
%p would compile to
%pre
%code
:preserve
ConstantLoad(ReturnSlot , IntegerConstant(2))
Set(ReturnSlot , OtherSlot)
ConstantLoad(ReturnSlot , IntegerConstant(4))
Set(ReturnSlot , EvenOtherSlot)
MethodCall() # unspecified details here
%h2#optimisations Optimisations
%p
But but but i hear that is so totally inefficient. All the time we move data around, to and from
that one Return slot, just so that the return is simple. Yes but no.
%p
It is very easy to optimize the trivial extra away. Many times the expression moves a value to Return
just to move it away in the next Instruction. A sequence like in above example
%pre
%code
:preserve
ConstantLoad(ReturnSlot , IntegerConstant(2))
Set(ReturnSlot , OtherSlot)
%p can easily be optimized into
%pre
%code
:preserve
ConstantLoad(OtherSlot , IntegerConstant(2))
%p tbc

View File

@ -1,72 +0,0 @@
Since i got the ideas of Slots and the associated instruction Set, i have been wondering how that
fits in with the code generation.
I moved the patched AST compiler methods to a Compiler, ok. But still what do all those compile
methods return.
## Expression
In ruby, everything is an expression. To recap "Expressions have a value, while statements do not",
or statements represent actions while expressions represent values.
So in ruby everything represents a value, also statements, or functions. There is no such thing
as the return void in C. Even loops and ifs result in a value, for a loop the last computed value
and for an if the value of the branch taken.
Having had a vague grasp of this concept i tried to sort of haphazardly return the kind of value
that i though appropriate. Sometimes literals, sometimes slots. Sometimes "Return" , a slot
representing the return value of a function.
## Return slot
Today i realized that the Slot representing the return value is special.
It does not hold the value that is returned, but rather the other way around.
A function returns what is in the Return slot, at the time of return.
From there it is easy to see that it must be the Return that holds the last computed value.
A function can return at any time after all.
The last computed value is the Expression that is currently evaluated. So the compile, which
initiates the evaluation, returns the Return slot. Always. Easy, simple, nice!
## Example
Constants: say the expression
true
would compile to a
ConstantLoad(ReturnSlot , TrueConstant)
While
2 + 4
would compile to
ConstantLoad(ReturnSlot , IntegerConstant(2))
Set(ReturnSlot , OtherSlot)
ConstantLoad(ReturnSlot , IntegerConstant(4))
Set(ReturnSlot , EvenOtherSlot)
MethodCall() # unspecified details here
## Optimisations
But but but i hear that is so totally inefficient. All the time we move data around, to and from
that one Return slot, just so that the return is simple. Yes but no.
It is very easy to optimize the trivial extra away. Many times the expression moves a value to Return
just to move it away in the next Instruction. A sequence like in above example
ConstantLoad(ReturnSlot , IntegerConstant(2))
Set(ReturnSlot , OtherSlot)
can easily be optimized into
ConstantLoad(OtherSlot , IntegerConstant(2))
tbc

View File

@ -0,0 +1,57 @@
%p
Quite long ago i
%a{:href => "/2014/06/27/an-exceptional-thought.html"} had already determined
that return
addresses and exceptional return addresses should be explicitly stored in the message.
%p
It was also clear that Message would have to be a linked list. Just managing that list at run-time
in Register Instructions (ie almost assembly) proved hard. Not that i was creating Message objects
but i did shuffle their links about. I linked and unlinked messages by setting their next/prev fields
at runtime.
%h2#the-list-is-static The List is static
%p
Now i realized that touching the list structure in any way at runtime is not necessary.
The list is completely static, ie created at compile time and never changed.
%p
To be more precise: I created the Messages at compile time and set them up as a forward linked list.
Each Item had
%em caller
field (a backlink) which i then filled at run-time. I was keeping the next
message to be used as a variable in the Space, and because that is basically global it was
relatively easy to update when making a call.
But i noticed when debugging that when i updated the messages next field, it was already set to
the value i was setting it to. And that made me stumble and think. Off course!
%p
It is the data
%strong in
the Messages that changes. But not the Message, nor the call chain.
%p
As programmer one has the call graph in mind and as that is a graph, i was thinking that the
Message list changes. But no. When working on one message, it is always the same message one sends
next. Just as one always returns to the same one that called.
%p It is the addresses and Method arguments that change, not the message.
%p
The best analogy i can think of is when calling a friend. Whatever you say, it is alwas the same
number you call.
%p
Or in C terms, when using the stack (push/pop), it is not the stack memory that changes, only the
pointer to the top. A stack is an array, right, so the array stays the same,
even its size stays the same. Only the used part of it changes.
%h2#simplifies-call-model Simplifies call model
%p
Obviously this simplifies the way one thinks about calls. Just stick the data into the pre-existing
Message objects and go.
%p
When i first had the
%a{:href => "/2014/06/27/an-exceptional-thought.html"} return address as argument
idea,
i was thinking that in case of exception one would have to garbage collect Messages.
In the same way that i was thinking that they need to be dynamically managed.
%p
Wrong again. The message chain (double linked list to be precise) stays. One just needs to clear
the data out from them, so that garbage does get collected. Anyway, its all quite simple and thats
nice.
%p
As an upshot from this new simplicity we get
= succeed "." do
%strong speed

View File

@ -1,53 +0,0 @@
Quite long ago i [had already determined](/2014/06/27/an-exceptional-thought.html) that return
addresses and exceptional return addresses should be explicitly stored in the message.
It was also clear that Message would have to be a linked list. Just managing that list at run-time
in Register Instructions (ie almost assembly) proved hard. Not that i was creating Message objects
but i did shuffle their links about. I linked and unlinked messages by setting their next/prev fields
at runtime.
## The List is static
Now i realized that touching the list structure in any way at runtime is not necessary.
The list is completely static, ie created at compile time and never changed.
To be more precise: I created the Messages at compile time and set them up as a forward linked list.
Each Item had *caller* field (a backlink) which i then filled at run-time. I was keeping the next
message to be used as a variable in the Space, and because that is basically global it was
relatively easy to update when making a call.
But i noticed when debugging that when i updated the message's next field, it was already set to
the value i was setting it to. And that made me stumble and think. Off course!
It is the data **in** the Messages that changes. But not the Message, nor the call chain.
As programmer one has the call graph in mind and as that is a graph, i was thinking that the
Message list changes. But no. When working on one message, it is always the same message one sends
next. Just as one always returns to the same one that called.
It is the addresses and Method arguments that change, not the message.
The best analogy i can think of is when calling a friend. Whatever you say, it is alwas the same
number you call.
Or in C terms, when using the stack (push/pop), it is not the stack memory that changes, only the
pointer to the top. A stack is an array, right, so the array stays the same,
even it's size stays the same. Only the used part of it changes.
## Simplifies call model
Obviously this simplifies the way one thinks about calls. Just stick the data into the pre-existing
Message objects and go.
When i first had the [return address as argument](/2014/06/27/an-exceptional-thought.html) idea,
i was thinking that in case of exception one would have to garbage collect Messages.
In the same way that i was thinking that they need to be dynamically managed.
Wrong again. The message chain (double linked list to be precise) stays. One just needs to clear
the data out from them, so that garbage does get collected. Anyway, it's all quite simple and that's
nice.
As an upshot from this new simplicity we get **speed**. As the method enter and exit codes are
3-4 (arm) instructions, we are on par with c. Oh and i forgot to mention Frames. Don't need to
generate those at run-time either. Every message gets a static Frame. Done. Up to the method
what to do with it. Ie don't use it or use it as array, or create an array to store more than
fits into the static frame.

View File

@ -0,0 +1,60 @@
%hr/
%p
After almost a year of rewrite:
%strong Hello World
is back.
%p
%strong Working executables again
%p
So much has changed in the last year it is almost impossible to recap.
Still a little summary:
%h3#register-machine Register Machine
%p
The whole layer of the
%a{:href => "/2014/09/30/a-better-register-machine.html"} Register Machine
as an
abstraction was not there. Impossible is was to see what was happening.
%h3#passes Passes
%p
In the beginning i was trying to
= succeed "." do
%em just do it
%a{:href => "/2014/07/05/layers-vs-passes.html"} implemented Passes
to go between them.
%h3#the-virtual-machine-design The virtual machine design
%p
Thinking about what objects makes up a virtual machine has brought me to a clear understanding
of the
= succeed "." do
%a{:href => "/2014/09/12/register-allocation-reviewed.html"} objects needed
%a{:href => "/2014/06/27/an-exceptional-thought.html"} stopped using the machine stack
altogether and am using a linked list instead.
Recently is has occurred to me that that linked list
%a{:href => "/06/20/the-static-call-chain.html"}> doesnt even change
, so it is very simple indeed.
%h3#smaller-though-not-small-changes Smaller, though not small, changes
%ul
%li
The
%a{:href => "/2014/08/19/object-storage.html"} Salma Object File
format was created.
%li
The
%a{:href => "http://dancinglightning.gitbooks.io/the-object-machine/content/"} Book
was started
%li I gave lightning talks at Frozen Rails 2014, Helsinki and Bath Ruby 2015
%li I presented at Munich and Zurich user groups, lots to take home from all that
%h3#future Future
%p
The mountain is still oh so high, but at last there is hope again. The second dip into arm
(gdb) debugging has made it very clear that a debugger is needed. Preferably visual, possibly 3d,
definitely browser based. So either Opal or even Volt.
%p Already more clarity in upcoming fields has arrived:
%ul
%li inlining is high on the list, to code in higher language
%li
the difference between
%a{:href => "/2015/05/20/expression-is-slot.html"} statement and expression
helped
to structure code.
%li hopefully the debugger / interpreter will help to write better tests too.

View File

@ -1,49 +0,0 @@
---
After almost a year of rewrite: **Hello World** is back.
**Working executables again**
So much has changed in the last year it is almost impossible to recap.
Still a little summary:
### Register Machine
The whole layer of the [Register Machine](/2014/09/30/a-better-register-machine.html) as an
abstraction was not there. Impossible is was to see what was happening.
### Passes
In the beginning i was trying to *just do it*. Just compile the vm down to arm instructions.
But the human brain (or possibly just mine) is not made to think in terms of process.
I think much better in terms of Structure. So i made vm and register instructions and
[implemented Passes](/2014/07/05/layers-vs-passes.html) to go between them.
### The virtual machine design
Thinking about what objects makes up a virtual machine has brought me to a clear understanding
of the [objects needed](/2014/09/12/register-allocation-reviewed.html).
In fact things got even simpler as stated in that post, as i have
[stopped using the machine stack](/2014/06/27/an-exceptional-thought.html)
altogether and am using a linked list instead.
Recently is has occurred to me that that linked list
[doesn't even change](/06/20/the-static-call-chain.html), so it is very simple indeed.
### Smaller, though not small, changes
- The [Salma Object File](/2014/08/19/object-storage.html) format was created.
- The [Book](http://dancinglightning.gitbooks.io/the-object-machine/content/) was started
- I gave lightning talks at Frozen Rails 2014, Helsinki and Bath Ruby 2015
- I presented at Munich and Zurich user groups, lots to take home from all that
### Future
The mountain is still oh so high, but at last there is hope again. The second dip into arm
(gdb) debugging has made it very clear that a debugger is needed. Preferably visual, possibly 3d,
definitely browser based. So either Opal or even Volt.
Already more clarity in upcoming fields has arrived:
- inlining is high on the list, to code in higher language
- the difference between [statement and expression](/2015/05/20/expression-is-slot.html) helped
to structure code.
- hopefully the debugger / interpreter will help to write better tests too.

View File

@ -0,0 +1,72 @@
%p
It really is like
%a{:href => "http://worrydream.com/#!/InventingOnPrinciple"} Bret Victor
says in his video:
good programmers are the ones who play computer in their head well.
%p Why? Because you have to, to program. And off course thats what im doing.
%p
But when it got to debugging, it got a bit much. Using gdb for non C code, i mean its bad enough
for c code.
%h2#the-debugger The debugger
%p
The process of getting my “hello world” to work was quite hairy, what with debugging with gdb
and checking registers and stuff. Brr.
%p
The idea for a “solution”, my own debugger, possibly graphical, came quite quickly. But the effort seemed a
little big. It took a little, but then i started.
%p
I fiddled a little with fancy 2 or even 3d representations but couldnt get things to work.
Also getting used to running ruby in the browser, with opal, took a while.
%p
But now there is a
%a{:href => "https://github.com/ruby-x/salama-debugger"} basic frame
up,
and i can see registers swishing around and ideas of what needs
to be visualized and partly even how, are gushing. Off course its happening in html,
but that ok for now.
%p
And the best thing: I found my first serious
%strong bug
visually. Very satisfying.
%p
I do so hope someone will pick this up and run with it. Ill put it on the site as soon as the first
program runs through.
%h2#interpreter Interpreter
%p
Off course to have a debugger i needed to start on an interpreter.
Now it wasnt just the technical challenge, but some resistance against interpreting, since the whole
idea of salama was to compile. But in the end it is a very different level that the interpreter
works at. I chose to put it at the register level (not the arm), so it would be useful for future
cpus, and because the register to arm mapping is mainly about naming, not functionality. Ie it is
pretty much one to one.
%p
But off course (he says after the fact), the interpreter solves a large part of the testing
issue. Because i wasnt really happy with tests, and that was because i didnt have a good
idea how to test. Sure unit tests, fine. But to write all the little unit tests and hope the
total will result in what you want never struck me as a good plan.
%p
Instead i tend to write system tests, and drop down to unit tests to find the bugs in system tests.
But i had no good system tests, other than running the executable. But
= succeed "." do
%strong now i do
%p
So two flies with one (oh i dont know how this goes, im not english), better test, and visual
feedback, both driving the process at double speed.
%p
Now i “just” need a good way to visualize a static and running program. (implement breakpoints,
build a class and object inpector, recompile on edit . . .)
%h2#debugger-rewritten-thrice Debugger rewritten, thrice
%p
Update: after trying around with a
%a{:href => "https://github.com/orbitalimpact/opal-pixi"} 2d graphics
implementation i have rewritten the ui in
%a{:href => "https://github.com/catprintlabs/react.rb"} react
,
%a{:href => "https://github.com/voltrb/volt"} Volt
and
= succeed "." do
%a{:href => "https://github.com/opal/opal-browser"} OpalBrowser
%p
The last is what got the easiest to understand code. Also has the least dependencies, namely
only opal and opal-browser. Opal-browser is a small opal wrapper around the browsers
javascript functionality.

View File

@ -1,63 +0,0 @@
It really is like [Bret Victor](http://worrydream.com/#!/InventingOnPrinciple) says in his video:
good programmers are the ones who play computer in their head well.
Why? Because you have to, to program. And off course that's what i'm doing.
But when it got to debugging, it got a bit much. Using gdb for non C code, i mean it's bad enough
for c code.
## The debugger
The process of getting my "hello world" to work was quite hairy, what with debugging with gdb
and checking registers and stuff. Brr.
The idea for a "solution", my own debugger, possibly graphical, came quite quickly. But the effort seemed a
little big. It took a little, but then i started.
I fiddled a little with fancy 2 or even 3d representations but couldn't get things to work.
Also getting used to running ruby in the browser, with opal, took a while.
But now there is a [basic frame](https://github.com/ruby-x/salama-debugger) up,
and i can see registers swishing around and ideas of what needs
to be visualized and partly even how, are gushing. Off course it's happening in html,
but that ok for now.
And the best thing: I found my first serious **bug** visually. Very satisfying.
I do so hope someone will pick this up and run with it. I'll put it on the site as soon as the first
program runs through.
## Interpreter
Off course to have a debugger i needed to start on an interpreter.
Now it wasn't just the technical challenge, but some resistance against interpreting, since the whole
idea of salama was to compile. But in the end it is a very different level that the interpreter
works at. I chose to put it at the register level (not the arm), so it would be useful for future
cpu's, and because the register to arm mapping is mainly about naming, not functionality. Ie it is
pretty much one to one.
But off course (he says after the fact), the interpreter solves a large part of the testing
issue. Because i wasn't really happy with tests, and that was because i didn't have a good
idea how to test. Sure unit tests, fine. But to write all the little unit tests and hope the
total will result in what you want never struck me as a good plan.
Instead i tend to write system tests, and drop down to unit tests to find the bugs in system tests.
But i had no good system tests, other than running the executable. But **now i do**.
I can just run the Interpreter on a program and
see if it produced the right output. And by right output i really just mean stdout.
So two flies with one (oh i don't know how this goes, i'm not english), better test, and visual
feedback, both driving the process at double speed.
Now i "just" need a good way to visualize a static and running program. (implement breakpoints,
build a class and object inpector, recompile on edit . . .)
## Debugger rewritten, thrice
Update: after trying around with a [2d graphics](https://github.com/orbitalimpact/opal-pixi)
implementation i have rewritten the ui in [react](https://github.com/catprintlabs/react.rb) ,
[Volt](https://github.com/voltrb/volt) and [OpalBrowser](https://github.com/opal/opal-browser).
The last is what got the easiest to understand code. Also has the least dependencies, namely
only opal and opal-browser. Opal-browser is a small opal wrapper around the browsers
javascript functionality.

View File

@ -0,0 +1,143 @@
%p
It is the
%strong one
thing i said i wasnt going to do: Write a language.
There are too many languages out there already, and just because i want to write a vm,
doesnt mean i want to add to the language jungle.
%strong But
%h2#the-gap The gap
%p
As it happens in life, which is why they say never to say never, it happens just like it
i didnt want. It turns out the semantic gap of what i have is too large.
%p
There is the
%strong register level
, which is approximately assembler, and there is the
%strong vm level
which is more or less the ruby level. So my head hurts from trying to implement ruby in assembler,
no wonder.
%p
Having run into this wall, which btw is the same wall that crystal ran into, one can see the sense
in what others have done more clearly: Why rubinus uses c++ underneath. Why crystal does not
implement ruby, but a statically typed language. And ultimately why there is no ruby compiler.
The gap is just too large to bridge.
%h2#the-need-for-a-language The need for a language
%p
As I have the architecture of passes, i was hoping to get by with just another layer in the
architecture. A tried an tested approach after all. And while i wont say that that isnt a
possibility, i just dont see it. I think it may be one of those where hindsight will be perfect.
%p
I can see as far as this: If i implement a language, that will mean a parser, ast and compiler.
The target will be my register layer. So a reasonable step up is a sort of object c, that has
basic integer maths and object access. Ill detail that more below, but the point is, if i have
that, i can start writing a vm implementation in that language.
%p
Off course the vm implementation involves a parser, an ast and a compiler, unless we go to the free
compilers (see below). And so implementing the vm in a new language is in essence swapping nodes of
the higher level tree with nodes of the lower level (c-ish) one. Ie parsing should not strictly
speaking be necessary. This node swapping is after all what the pass architecture was designed
to do. But, as i said, i just cant see that happening (yet?).
%h3#trees-vs-blocks Trees vs. Blocks
%p
Speaking of the Pass architecture: I flopped. Well, maybe not so much with the actual Passes, but
with the Method representation. Blocks holding Instructions, and being in essence a list.
Misinformed copying from llvm, misinformed by the final outcome. Off course the final binary
has a linear address space, but that is where the linearity ends. The natural structure of code
is a tree, not a list, as demonstrated by the parse
= succeed "." do
%em tree
%h2#soml---salama-object-machine-language Soml - Salama Object Machine Language
%h3#typed Typed
%p
Quite a while before crystallizing into the idea of a new language, i already saw the need for a type
system. Off course, and this dates back to the first memory layouts. But i mean the need for a
%em strong typing
system, or maybe its even clearer to call it compile time typing. The type that c
and c++ have. It is essential (mentally, this is off course all for the programmer, not the computer)
to be able to think in a static type system, and then extend that and make it dynamic.
Or possibly use it in a dynamic way.
%p
This is a good example of this too big gap, where one just steps on quicksand if everything is
all the time dynamic.
%p
The way i had the implementation figured was to have different versions of the same function. In
each function we would have compile time types, everything known. Ill probably still do that,
just written in Soml.
%h3#machine-language Machine language
%p
Soml is a machine language for the Salama machine. As i tried to implement without this layer, i was
essentially implementing in assembler. Too much.
%p
There are two main feature we need from the machine language, one is typed a typed oo memory model,
the other an oo call model.
%h3#object-c Object c
%p
The language needs to be object based, off course. Just because its typed and not dynamic
and closer to assembler, doesnt mean we need to give up objects. In fact we mustnt. Soml
should be a little bit like c++, ie compile time known variable arrangement and types,
objects. But no classes (or inheritance), more like structs, with full access to everything.
So a struct.variable syntax would mean grab that variable at that address, no functions, no possible
override, just get it. This is actually already implemented as i needed it for the slot access.
%p So objects without encapsulation or classes. A lower level object orientation.
%h3#whitequark Whitequark
%p
This new approach (and more experience) shed a new light on ruby parsing. The previous idea was to
start small, write the necessary stuff in the parsable subset and with time expand that set.
%p
Alas . . ruby is a beast to parse, and because of the
%strong semantic gap
writing the system,
even in a subset, is not viable. And it turns out the brave warriors of the ruby community have
already produced a pure, production ready,
= succeed "." do
%a{:href => "https://github.com/whitequark/parser"} ruby parser
%h3#interoperability Interoperability
%p
The system code needs to be callable from the higher level, and possibly the other way around.
This probably means the same or compatible calling mechanism and data model. The data model is
quite simple as the at the system level all is just machine words, but in object sized
packets. As for the calling it will probably mean that the same message object needs to be used
and what is now called calling at the machine level is supported. Sending off course wont be.
%h3#still-missing-a-piece Still missing a piece
%p
How the level below calling can be represented is still open. It is clear though that it does need
to be present, as otherwise any kind of concurrency is impossible to achieve. The question ties
in with the still open question of
= succeed "." do
%a{:href => "http://valerieaurora.org/synthesis/SynthesisOS/ch4.html"} Quajects
%h3#start-small Start small
%p The first next step is to wrap the functionality i have in the Passes as a language.
%p Then to expand that language, by writing increasingly more complex programs in it.
%p
And then to re-attack ruby using the whitequark parser, that probably means jumping on the
mspec train.
%p All in all, no biggie :-)
%h2#compilers-are-not-free Compilers are not free
%p
Oh and i re-read and re-watched Toms
%a{:href => "http://codon.com/compilers-for-free"} compilers for free
talk,
which did make quite an impression on me the first time. But when i really thought about actually
going down that road (who doest enjoy a free beer), i got into the small print.
%p
The second biggest of which is that writing a partial evaluator is just about as complicated
as writing a compiler.
%p
But the biggest problem is that the (free) compiler you could get, has the implementation language
of the evaluator, as its
= succeed "." do
%strong output
%em for
c, not for ruby.
%p
Ok, maybe it is not quite as bad as that makes it sound. As i do have the register layer ready
and will be writing a c-ish language, it may even be possible to write an interpreter
= succeed "," do
%strong in soml
%strong for soml
too.
%p
I will nevertheless go the straighter route for now, ie write a compiler, and maybe return to the
promised freebie later. It does feel like a lot of what the partial evaluator is, would be called
compiler optimization in another lingo. So may be road will lead there naturally.

View File

@ -1,144 +0,0 @@
It is the **one** thing i said i wasn't going to do: Write a language.
There are too many languages out there already, and just because i want to write a vm,
doesn't mean i want to add to the language jungle.
**But** ...
## The gap
As it happens in life, which is why they say never to say never, it happens just like it
i didn't want. It turns out the semantic gap of what i have is too large.
There is the **register level** , which is approximately assembler, and there is the **vm level**
which is more or less the ruby level. So my head hurts from trying to implement ruby in assembler,
no wonder.
Having run into this wall, which btw is the same wall that crystal ran into, one can see the sense
in what others have done more clearly: Why rubinus uses c++ underneath. Why crystal does not
implement ruby, but a statically typed language. And ultimately why there is no ruby compiler.
The gap is just too large to bridge.
## The need for a language
As I have the architecture of passes, i was hoping to get by with just another layer in the
architecture. A tried an tested approach after all. And while i won't say that that isn't a
possibility, i just don't see it. I think it may be one of those where hindsight will be perfect.
I can see as far as this: If i implement a language, that will mean a parser, ast and compiler.
The target will be my register layer. So a reasonable step up is a sort of object c, that has
basic integer maths and object access. I'll detail that more below, but the point is, if i have
that, i can start writing a vm implementation in that language.
Off course the vm implementation involves a parser, an ast and a compiler, unless we go to the free
compilers (see below). And so implementing the vm in a new language is in essence swapping nodes of
the higher level tree with nodes of the lower level (c-ish) one. Ie parsing should not strictly
speaking be necessary. This node swapping is after all what the pass architecture was designed
to do. But, as i said, i just can't see that happening (yet?).
### Trees vs. Blocks
Speaking of the Pass architecture: I flopped. Well, maybe not so much with the actual Passes, but
with the Method representation. Blocks holding Instructions, and being in essence a list.
Misinformed copying from llvm, misinformed by the final outcome. Off course the final binary
has a linear address space, but that is where the linearity ends. The natural structure of code
is a tree, not a list, as demonstrated by the parse *tree*. Flattening it just creates navigational
problems. Also as a metal model it is easier, as it is easy to imagine swapping out subtrees,
expanding or collapsing nodes etc.
## Soml - Salama Object Machine Language
### Typed
Quite a while before crystallizing into the idea of a new language, i already saw the need for a type
system. Off course, and this dates back to the first memory layouts. But i mean the need for a
*strong typing* system, or maybe it's even clearer to call it compile time typing. The type that c
and c++ have. It is essential (mentally, this is off course all for the programmer, not the computer)
to be able to think in a static type system, and then extend that and make it dynamic.
Or possibly use it in a dynamic way.
This is a good example of this too big gap, where one just steps on quicksand if everything is
all the time dynamic.
The way i had the implementation figured was to have different versions of the same function. In
each function we would have compile time types, everything known. I'll probably still do that,
just written in Soml.
### Machine language
Soml is a machine language for the Salama machine. As i tried to implement without this layer, i was
essentially implementing in assembler. Too much.
There are two main feature we need from the machine language, one is typed a typed oo memory model,
the other an oo call model.
### Object c
The language needs to be object based, off course. Just because it's typed and not dynamic
and closer to assembler, doesn't mean we need to give up objects. In fact we mustn't. Soml
should be a little bit like c++, ie compile time known variable arrangement and types,
objects. But no classes (or inheritance), more like structs, with full access to everything.
So a struct.variable syntax would mean grab that variable at that address, no functions, no possible
override, just get it. This is actually already implemented as i needed it for the slot access.
So objects without encapsulation or classes. A lower level object orientation.
### Whitequark
This new approach (and more experience) shed a new light on ruby parsing. The previous idea was to
start small, write the necessary stuff in the parsable subset and with time expand that set.
Alas . . ruby is a beast to parse, and because of the **semantic gap** writing the system,
even in a subset, is not viable. And it turns out the brave warriors of the ruby community have
already produced a pure, production ready, [ruby parser](https://github.com/whitequark/parser).
That can obviously read itself and anything else, so the start small approach is doubly out.
### Interoperability
The system code needs to be callable from the higher level, and possibly the other way around.
This probably means the same or compatible calling mechanism and data model. The data model is
quite simple as the at the system level all is just machine words, but in object sized
packets. As for the calling it will probably mean that the same message object needs to be used
and what is now called calling at the machine level is supported. Sending off course won't be.
### Still missing a piece
How the level below calling can be represented is still open. It is clear though that it does need
to be present, as otherwise any kind of concurrency is impossible to achieve. The question ties
in with the still open question of [Quajects](http://valerieaurora.org/synthesis/SynthesisOS/ch4.html).
Meaning, what is the yin in the yin and yang of object oriented programming. The normal yang way sees
the code as active and the data as passive. By normal i mean oo implementations in which blocks and
closures just fall from the sky and have no internal structure. There is obviously a piece of
the puzzle missing that Alexia was onto.
### Start small
The first next step is to wrap the functionality i have in the Passes as a language.
Then to expand that language, by writing increasingly more complex programs in it.
And then to re-attack ruby using the whitequark parser, that probably means jumping on the
mspec train.
All in all, no biggie :-)
## Compilers are not free
Oh and i re-read and re-watched Toms [compilers for free](http://codon.com/compilers-for-free) talk,
which did make quite an impression on me the first time. But when i really thought about actually
going down that road (who does't enjoy a free beer), i got into the small print.
The second biggest of which is that writing a partial evaluator is just about as complicated
as writing a compiler.
But the biggest problem is that the (free) compiler you could get, has the implementation language
of the evaluator, as it's **output**. You need a compiler to start with, in other words.
Also the interpreter would have to be written in the same compilable language.
So writing a ruby compiler by writing a ruby interpreter would mean
writing the interpreter in c, and (worse) writing the partial evaluator *for* c, not for ruby.
Ok, maybe it is not quite as bad as that makes it sound. As i do have the register layer ready
and will be writing a c-ish language, it may even be possible to write an interpreter **in soml**,
and then it would be ok to write an evaluator **for soml** too.
I will nevertheless go the straighter route for now, ie write a compiler, and maybe return to the
promised freebie later. It does feel like a lot of what the partial evaluator is, would be called
compiler optimization in another lingo. So may be road will lead there naturally.

View File

@ -0,0 +1,72 @@
%p
Ok, that was surprising: I just wrote a language in two months. Parser, compiler, working binaries
and all.
%p
Then i
%a{:href => "/typed/typed.html"} documented it
, detailed the
%a{:href => "/typed/syntax.html"} syntax
and even did
some
= succeed "." do
%a{:href => "/typed/benchmarks.html"} benchmarking
%p
So, the good news: it
%strong it works
%p
Working means: calling works, if, while, assignment, class and method definition. The benchmarks
were hello world and fibonacci, both recursive and by looping.
%p
I even updated the
%a{:href => "/book.html"}
%strong whole book
to be up to date. Added a Soml section, updated
parfait, rewrote the register level . . .
%h3#it-all-clicked-into-place It all clicked into place
%p
To be fair, i dont think anyone writes a language that isnt a toy in 2 months, and it was only
possible because a lot of the stuff was there already.
%ul
%li
%a{:href => "/typed/parfait.html"} Parfait
was pretty much there. Just consolidated it as it is all just adapter.
%li
The
%a{:href => "/typed/debugger.html"} Register abstraction
(bottom) was there.
%li Using the ast library made things easier.
%li
A lot of the
%a{:href => "https://github.com/ruby-x/salama-reader"} parser
could be reused.
%p And off course the second time around everything is easier (aka hindsight is perfect).
%p
One of the better movie lines comes to mind,
(
%a{:href => "http://www.imdb.com/title/tt1341188/quotes"}> paraphrased
) “We are all just one small
adjustment away from making our code work”. It was a step sideways in the head which brought a leap
forward in terms of direction. Not where i was going but where i wanted to go.
%h3#open-issues Open issues
%p
Clearly i had wobbled on the parfait front. Now its clear it will have to be recoded in soml,
and then re-translated into ruby. But it was good to have it there in ruby all the time for the
concepts to solidify.
%p
Typing is not completely done, and negative tests for types are non existant. Also exceptions and
the machinery for the returns.
%p
I did a nice framework for testing the binaries on a remote machine, would be nice to have it
on travis. But my image is over 2Gb.
%h3#and-onto-the-next-compiler And onto the next compiler
%p
The ideas about how to compile ruby into soml have been percolating and are waiting to be put to
action.
%a{:href => "http://book.salama-vm.org/object/dynamic_types.html"} The theory
looks good,but one has
to see it to believe it.
%p
The first steps are quite clear though. Get the
%a{:href => "https://github.com/whitequark/parser"} ruby parser
integrated, get the compiler up, start with small tests. Work the types at the same time.
%p And let the adventure continue.

View File

@ -1,57 +0,0 @@
Ok, that was surprising: I just wrote a language in two months. Parser, compiler, working binaries
and all.
Then i [documented it](/typed/typed.html) , detailed the [syntax](/typed/syntax.html) and even did
some [benchmarking](/typed/benchmarks.html). Speed is luckily roughly where i wanted it. Mostly
(only mostly?) slower than c, but only by about 50, very understandable percent. It is doing
things in a more roundabout, and easier to understand way, and lacking any optimisation. It means
you can do about a million fibonacci(20) in a second on a pi, and beat ruby at it by a about
a factor of 20.
So, the good news: it **it works**
Working means: calling works, if, while, assignment, class and method definition. The benchmarks
were hello world and fibonacci, both recursive and by looping.
I even updated the [**whole book**](/book.html) to be up to date. Added a Soml section, updated
parfait, rewrote the register level . . .
### It all clicked into place
To be fair, i don't think anyone writes a language that isn't a toy in 2 months, and it was only
possible because a lot of the stuff was there already.
- [Parfait](/typed/parfait.html) was pretty much there. Just consolidated it as it is all just adapter.
- The [Register abstraction](/typed/debugger.html) (bottom) was there.
- Using the ast library made things easier.
- A lot of the [parser](https://github.com/ruby-x/salama-reader) could be reused.
And off course the second time around everything is easier (aka hindsight is perfect).
One of the better movie lines comes to mind,
([paraphrased](http://www.imdb.com/title/tt1341188/quotes)) "We are all just one small
adjustment away from making our code work". It was a step sideways in the head which brought a leap
forward in terms of direction. Not where i was going but where i wanted to go.
### Open issues
Clearly i had wobbled on the parfait front. Now it's clear it will have to be recoded in soml,
and then re-translated into ruby. But it was good to have it there in ruby all the time for the
concepts to solidify.
Typing is not completely done, and negative tests for types are non existant. Also exceptions and
the machinery for the returns.
I did a nice framework for testing the binaries on a remote machine, would be nice to have it
on travis. But my image is over 2Gb.
### And onto the next compiler
The ideas about how to compile ruby into soml have been percolating and are waiting to be put to
action. [The theory](http://book.salama-vm.org/object/dynamic_types.html) looks good,but one has
to see it to believe it.
The first steps are quite clear though. Get the [ruby parser](https://github.com/whitequark/parser)
integrated, get the compiler up, start with small tests. Work the types at the same time.
And let the adventure continue.

View File

@ -0,0 +1,63 @@
%p
Writing Soml helped a lot to separate the levels, or phases of the ruby compilation process. Helped
me that is, to plan the ruby compiler.
%p
But off course i had not written the ruby compiler, i have only
%a{:href => "https://dancinglightning.gitbooks.io/the-object-machine/content/object/dynamic_types.html"} planned
how the dynamic nature could be implemented, using soml. In very short summary, the plan was to
extend somls feature with esoteric multi-return features and use that to jump around different
implementation when types change.
%h2#the-benefit-of-communication The benefit of communication
%p
But first a thanks. When i was in the US, i talked to quite a few people about my plans. Everything
helped, but special thanks goes to Caleb for pointing out two issues.
%p
The simpler one is that what i had named Layout, is usually called Type. I have changed the code
and docs now and must admit it is a better name.
%p
The other thing Caleb was right about is that Soml is what is called an intermediate representation.
This rubbed a little, especially since i had just moved away from a purely intermediate
representation to an intermediate language. But still, well see below that the language is not
enough to solve the dynamic issues. I have already created an equivalent intermediate
representation (to the soml ast) and will probably let go of the language completely, in time.
%p
So thanks to Caleb, and a thumbs up for anyone else reading, to
%strong make contact
%h2#the-hierarchy-of-languages The hierarchy of languages
%p
It seemed like such a good idea. Just like third level languages are compiled down to second (ie c
to assembler), and second is compiled to first (ie assembler to binary), so fourth level would
get compiled down to third. Such a nice clean world, very appealing.
%p
Until i started working on the details. Specifically how the type (of almost anything) would change
in a statically typed language. And in short, I ran into yet another wall.
%p
So back it is to using an intermediate representation. Alas, at least it is a working one, so down
from there to executable, it is know to work.
%h2#cross-function-jumps Cross function jumps
%p
Lets call a method something akin to what ruby has. Its bound to a type, has a name and arguments.
But both return types and argument types are not specified. Then function could be a specific
implementation of that method, specific to a certain set of types for the arguments. The return type
%br/
is still not fixed.
%p
A compiler can generate all possible functions for a method as the set of basic types is small. Or
it could be a little cleverer and generate stubs and generate the actual functions on demand, as
probably only a fraction of the theoretical possibilities will be needed.
%p
Now, if we have an assignment, say to an argument, from a method call, the type of the variable
may have to change according to the return type.
So the return will be to different addresses (think of it as an if) and so in each branch,
code can be inserted to change the type. But that makes the rest of the function behave wrongly as
it assumes the type before the change.
%p
And this is where the cross function jumps come. Which is also the reason this can not be expressed
in a language. The code then needs to jump to the same place, in a different function.
%p
The function can be pre-compiled or compiled on demand at that point. All that matters is that the
logic of the function being jumped to is the same as where the jump comes from. And this is
guaranteed by the fact that both function are generated from the same (untyped ruby) source code.
%h2#next-steps Next steps
%p So whats left to do here: There is the little matter of implementing this plan.
%p Maybe it leads to another wall, maybe this is it. Fingers crossed.

View File

@ -1,66 +0,0 @@
Writing Soml helped a lot to separate the levels, or phases of the ruby compilation process. Helped
me that is, to plan the ruby compiler.
But off course i had not written the ruby compiler, i have only
[planned](https://dancinglightning.gitbooks.io/the-object-machine/content/object/dynamic_types.html)
how the dynamic nature could be implemented, using soml. In very short summary, the plan was to
extend somls feature with esoteric multi-return features and use that to jump around different
implementation when types change.
## The benefit of communication
But first a thanks. When i was in the US, i talked to quite a few people about my plans. Everything
helped, but special thanks goes to Caleb for pointing out two issues.
The simpler one is that what i had named Layout, is usually called Type. I have changed the code
and docs now and must admit it is a better name.
The other thing Caleb was right about is that Soml is what is called an intermediate representation.
This rubbed a little, especially since i had just moved away from a purely intermediate
representation to an intermediate language. But still, we'll see below that the language is not
enough to solve the dynamic issues. I have already created an equivalent intermediate
representation (to the soml ast) and will probably let go of the language completely, in time.
So thanks to Caleb, and a thumbs up for anyone else reading, to **make contact**
## The hierarchy of languages
It seemed like such a good idea. Just like third level languages are compiled down to second (ie c
to assembler), and second is compiled to first (ie assembler to binary), so fourth level would
get compiled down to third. Such a nice clean world, very appealing.
Until i started working on the details. Specifically how the type (of almost anything) would change
in a statically typed language. And in short, I ran into yet another wall.
So back it is to using an intermediate representation. Alas, at least it is a working one, so down
from there to executable, it is know to work.
## Cross function jumps
Let's call a method something akin to what ruby has. It's bound to a type, has a name and arguments.
But both return types and argument types are not specified. Then function could be a specific
implementation of that method, specific to a certain set of types for the arguments. The return type
is still not fixed.
A compiler can generate all possible functions for a method as the set of basic types is small. Or
it could be a little cleverer and generate stubs and generate the actual functions on demand, as
probably only a fraction of the theoretical possibilities will be needed.
Now, if we have an assignment, say to an argument, from a method call, the type of the variable
may have to change according to the return type.
So the return will be to different addresses (think of it as an if) and so in each branch,
code can be inserted to change the type. But that makes the rest of the function behave wrongly as
it assumes the type before the change.
And this is where the cross function jumps come. Which is also the reason this can not be expressed
in a language. The code then needs to jump to the same place, in a different function.
The function can be pre-compiled or compiled on demand at that point. All that matters is that the
logic of the function being jumped to is the same as where the jump comes from. And this is
guaranteed by the fact that both function are generated from the same (untyped ruby) source code.
## Next steps
So what's left to do here: There is the little matter of implementing this plan.
Maybe it leads to another wall, maybe this is it. Fingers crossed.

View File

@ -0,0 +1,108 @@
%p So, the plan, in short:
%ol
%li I need to work a little more on docs. Reading them i notice they are still not up to date
%li The Type system needs work
%li The Method-Function relationship needs to be created
%li Ruby compiler needs to be written
%li Parfait moves back completely into ruby land
%li Soml parser should be scrapped (or will become redundant by 2-4)
%li The memory model needs reworking (global not object based memory)
%h3#type-system 2. Type system
%p
A Type is an ordered list of associations from name to BasicType (Object/Integer). The class exists
off course and has been implemented as an array with the names and BasicTypes laid out in sequence.
This is basically fine, but support for navigation is missing.
%p
The whole type system is basically graph. A type
%em A
is connected to a type
%em B
if it has exactly
one different BasicType. So
%em A
needs to have
%strong exactly
the same names, and
%strong exactly
one
different BasicType. Another way of saying this is that the two types are related if in the class
that Type represents, exactly one variable changes type. This is off course exactly what happens
when an assignment assigns a different type.
%p
%em A
and
%em B
are also related when
%em A
has exactly one more name entry than
%em B
, but os otherwise
identical. This is what happens when a new variable is added too a class, or one is removed.
%p
The implementation needs to establish this graph (possibly lazily), so that the traversal is fast.
The most likely implementation seems a hash, so a hashing function has to be designed and the equals
implemented.
%h3#method-function-relationship 3. Method-Function relationship
%p
Just to get the naming clear: A method is at the ruby level, untyped. A Class has references to
Methods.
%p
Whereas a Function is at the level below, fully typed.
Functions arguments and local variables have a BasicType.
Type has references to Functions.
%p
A Functions type is fully described by the combination of the arguments Type and the Frame Type.
The Frame object is basically a wrapper for all local variables.
%p
A (ruby) Method has N Function “implementations”. One function for each different combination of
BasicTypes for arguments and local variables. Functions know which Method they belong to, because
their parent Type class holds a reference to the Class that the Type describes.
%h3#ruby-compiler 4. Ruby Compiler
%p
Currently there is only the Function level and the soml compiler. The ruby level / compiler is
missing.
%p
The Compiler generates Class objects, and Type objects as far as it can determine name and
BasicTypes of the instance variables.
%p
Then it creates Method objects for every Method parsed. Finally it must create all functions that
needed. In a first brute-force approach this may mean creating functions for all possible
type combinations.
%p
Call-sites must then be “linked”. Linking here refers to the fact that the compiler can not
determine how to call a function before it is created. So the functions get created in a first pass
and calls and returns “linked” in a second. The return addresses used at the “soml” level are
dependent on the BasicType that is being returned. This involves associating the code labels (at
the function level) with the ast nodes they come from (at the method level). With this, the compiler
ensures that the type of the variable receiving the return value is correct.
%h3#parfait-in-ruby 5. Parfait in ruby
%p
After SOML was originally written, parts of the run-time (parfait) was ported to soml. This was done with the
idea that the run-time is low level and thus needs to be fully typed. As it turns out this is only
partly correct, in the sense that there needs to exist Function definitions (in the sense above)
that implement basic functionality. But as the sub-chapter on the ruby compiler should explain,
this does not mean the code has to written in a typed language.
%p
After the ruby-compiler is implemented, the run-time can be implemented in ruby. While this may seem
strange at first, one must remember that the ruby-compiler creates N Functions of each method for
all possible type combinations. This means if the ruby method is correctly implemented, error
handling, for type errors, will be correctly generated by the compiler.
%h3#soml-goodbye 6. SOML goodbye
%p
By this time the soml language can be removed. Meaning the parser for the language and all
documentation is not needed. The ruby-complier compilers straight into the soml internal
representation (as the soml parser) and because parfait is back in ruby land, soml should be
removed. Which is a relief, because there are definitely enough languages in the world.
%h3#memory-model-rework 7. Memory model rework
%p
Slightly unrelated to the above (read: can be done at the same time), the memory model needs to be
expanded. The current per object
%em fake
memory works fine, but leaves memory management in
the compiler.
%p
Since ultimately memory management should be part of the run-time, the model needs to be changed
to a global one. This means class Page and Space should be implemented, and the
%em fake
memory
mapped to a global array.

View File

@ -1,94 +0,0 @@
So, the plan, in short:
1. I need to work a little more on docs. Reading them i notice they are still not up to date
2. The Type system needs work
3. The Method-Function relationship needs to be created
4. Ruby compiler needs to be written
5. Parfait moves back completely into ruby land
6. Soml parser should be scrapped (or will become redundant by 2-4)
7. The memory model needs reworking (global not object based memory)
### 2. Type system
A Type is an ordered list of associations from name to BasicType (Object/Integer). The class exists
off course and has been implemented as an array with the names and BasicTypes laid out in sequence.
This is basically fine, but support for navigation is missing.
The whole type system is basically graph. A type *A* is connected to a type *B* if it has exactly
one different BasicType. So *A* needs to have **exactly** the same names, and **exactly** one
different BasicType. Another way of saying this is that the two types are related if in the class
that Type represents, exactly one variable changes type. This is off course exactly what happens
when an assignment assigns a different type.
*A* and *B* are also related when *A* has exactly one more name entry than *B* , but os otherwise
identical. This is what happens when a new variable is added too a class, or one is removed.
The implementation needs to establish this graph (possibly lazily), so that the traversal is fast.
The most likely implementation seems a hash, so a hashing function has to be designed and the equals
implemented.
### 3. Method-Function relationship
Just to get the naming clear: A method is at the ruby level, untyped. A Class has references to
Methods.
Whereas a Function is at the level below, fully typed.
Function's arguments and local variables have a BasicType.
Type has references to Functions.
A Function's type is fully described by the combination of the arguments Type and the Frame Type.
The Frame object is basically a wrapper for all local variables.
A (ruby) Method has N Function "implementations". One function for each different combination of
BasicTypes for arguments and local variables. Functions know which Method they belong to, because
their parent Type class holds a reference to the Class that the Type describes.
### 4. Ruby Compiler
Currently there is only the Function level and the soml compiler. The ruby level / compiler is
missing.
The Compiler generates Class objects, and Type objects as far as it can determine name and
BasicTypes of the instance variables.
Then it creates Method objects for every Method parsed. Finally it must create all functions that
needed. In a first brute-force approach this may mean creating functions for all possible
type combinations.
Call-sites must then be "linked". Linking here refers to the fact that the compiler can not
determine how to call a function before it is created. So the functions get created in a first pass
and calls and returns "linked" in a second. The return addresses used at the "soml" level are
dependent on the BasicType that is being returned. This involves associating the code labels (at
the function level) with the ast nodes they come from (at the method level). With this, the compiler
ensures that the type of the variable receiving the return value is correct.
### 5. Parfait in ruby
After SOML was originally written, parts of the run-time (parfait) was ported to soml. This was done with the
idea that the run-time is low level and thus needs to be fully typed. As it turns out this is only
partly correct, in the sense that there needs to exist Function definitions (in the sense above)
that implement basic functionality. But as the sub-chapter on the ruby compiler should explain,
this does not mean the code has to written in a typed language.
After the ruby-compiler is implemented, the run-time can be implemented in ruby. While this may seem
strange at first, one must remember that the ruby-compiler creates N Functions of each method for
all possible type combinations. This means if the ruby method is correctly implemented, error
handling, for type errors, will be correctly generated by the compiler.
### 6. SOML goodbye
By this time the soml language can be removed. Meaning the parser for the language and all
documentation is not needed. The ruby-complier compilers straight into the soml internal
representation (as the soml parser) and because parfait is back in ruby land, soml should be
removed. Which is a relief, because there are definitely enough languages in the world.
### 7. Memory model rework
Slightly unrelated to the above (read: can be done at the same time), the memory model needs to be
expanded. The current per object *fake* memory works fine, but leaves memory management in
the compiler.
Since ultimately memory management should be part of the run-time, the model needs to be changed
to a global one. This means class Page and Space should be implemented, and the *fake* memory
mapped to a global array.

View File

@ -0,0 +1,61 @@
%h2#rubyx-compiles-ruby-to-binary RubyX compiles ruby to binary
%p
The previous name was from a time in ancient history, three years ago, in internet time over
a decade (X years!). From when i thought i was going to build
a virtual machine. It has been clear for a while that what i am really doing is building a
compiler. A new thing needs a new name and finally inspiration struck in the form of RubyX.
%p
Its a bit of a shame that both domain and github were taken, but the - versions work well too.
Renaming of the organization, repositories and changing of domain is now complete. I did not
rewrite history, so all old posts still refer to salama.
%p
What i like about the new name most, is the closeness to ruby, this is after all an implementation
of ruby. Also the unclarity of what the X is is nice, is it as in X-files, the unknown of the
maths variable or ala mac, the 10 for a version number? Or the hope of achieving 10 times
performance as a play on the 3 times performance of ruby 3. Its a mystery, but it is a ruby
mystery and that is the main thing.
%h3#type-system 2. Type system
%p About the work that has been done, the type system rewrite is probably the biggest.
%p
Types are now immutable throughout the system, and the space keeps a list of all unique types.
Adding, removing, changing type all goes through a hashing process and leads to a unique
instance, that may have to be created.
%h3#typedmethod-arguments-and-locals 3. TypedMethod arguments and locals
%p
Close on the heal of the type immutability was the change to types as argument and local variable
descriptors. A type instance is now used to describe the arguments (names and types) uniquely,
clearing up previous imprecision.
%p
Argument and locals type, along with the name of the method describe a method uniquely. Obviously
the types may not be changed. Methods with different argument types are thus different methods, a
fact that still has to be coded into the ruby compiler.
%h3#arguments-and-calling-convention 4. Arguments and calling convention
%p
The Message used to carry the arguments, while locals were a separate frame object. An imbalance
if one thinks about closures, as both have to be decoupled from their activation.
%p
Now both arguments and locals are represented as NamedLists, which are basically just objects.
The type is transferred from the method to the NamedList instance at call time, so it is available
at run-time. This makes the whole calling convention easier to understand.
%h3#parfait-in-ruby 5. Parfait in ruby
%p
Parfait is more normal ruby now, specifically we are using instance variables in Parfait again,
just like in any ruby. When compiling we have to deal with the mapping to indexes, but thats what
we have types for, so no problem. The new version simplifies the boot process a little too.
%p Positioning has been removed from Parfait completely and pushed into the Assembler where it belongs.
%h3#soml-goodbye 6. SOML goodbye
%p
All trances of the soml language have been eradicated. All that is left is an intermediate typed
tree representation. But the MethodCompiler still generates binary so thats good.
Class and method generation capabilities have been removed from that compiler and now live
one floor up, at the ruby level.
%h3#ruby-compiler 7. Ruby Compiler
%p
Finally work on the ruby compiler has started and after all that ground work is actually quite easy.
Class statements create classes already. Method definitions extract their argument and local
variable names, and create their representation as RubyMethod. More to come.
%p
All in all almost all of the previous posts todos are done. Next up is the fanning of RubyMethods
into TypedMethods by instantiating type variations. When compilation of those works, i just need
to implement the cross function jumps and voila.
%p Certainly an interesting year ahead.

View File

@ -1,71 +0,0 @@
## RubyX compiles ruby to binary
The previous name was from a time in ancient history, three years ago, in internet time over
a decade (X years!). From when i thought i was going to build
a virtual machine. It has been clear for a while that what i am really doing is building a
compiler. A new thing needs a new name and finally inspiration struck in the form of RubyX.
It's a bit of a shame that both domain and github were taken, but the - versions work well too.
Renaming of the organization, repositories and changing of domain is now complete. I did not
rewrite history, so all old posts still refer to salama.
What i like about the new name most, is the closeness to ruby, this is after all an implementation
of ruby. Also the unclarity of what the X is is nice, is it as in X-files, the unknown of the
maths variable or ala mac, the 10 for a version number? Or the hope of achieving 10 times
performance as a play on the 3 times performance of ruby 3. It's a mystery, but it is a ruby
mystery and that is the main thing.
### 2. Type system
About the work that has been done, the type system rewrite is probably the biggest.
Types are now immutable throughout the system, and the space keeps a list of all unique types.
Adding, removing, changing type all goes through a hashing process and leads to a unique
instance, that may have to be created.
### 3. TypedMethod arguments and locals
Close on the heal of the type immutability was the change to types as argument and local variable
descriptors. A type instance is now used to describe the arguments (names and types) uniquely,
clearing up previous imprecision.
Argument and locals type, along with the name of the method describe a method uniquely. Obviously
the types may not be changed. Methods with different argument types are thus different methods, a
fact that still has to be coded into the ruby compiler.
### 4. Arguments and calling convention
The Message used to carry the arguments, while locals were a separate frame object. An imbalance
if one thinks about closures, as both have to be decoupled from their activation.
Now both arguments and locals are represented as NamedList's, which are basically just objects.
The type is transferred from the method to the NamedList instance at call time, so it is available
at run-time. This makes the whole calling convention easier to understand.
### 5. Parfait in ruby
Parfait is more normal ruby now, specifically we are using instance variables in Parfait again,
just like in any ruby. When compiling we have to deal with the mapping to indexes, but that's what
we have types for, so no problem. The new version simplifies the boot process a little too.
Positioning has been removed from Parfait completely and pushed into the Assembler where it belongs.
### 6. SOML goodbye
All trances of the soml language have been eradicated. All that is left is an intermediate typed
tree representation. But the MethodCompiler still generates binary so that's good.
Class and method generation capabilities have been removed from that compiler and now live
one floor up, at the ruby level.
### 7. Ruby Compiler
Finally work on the ruby compiler has started and after all that ground work is actually quite easy.
Class statements create classes already. Method definitions extract their argument and local
variable names, and create their representation as RubyMethod. More to come.
All in all almost all of the previous posts todos are done. Next up is the fanning of RubyMethods
into TypedMethods by instantiating type variations. When compilation of those works, i just need
to implement the cross function jumps and voila.
Certainly an interesting year ahead.

View File

@ -0,0 +1,122 @@
%p
I just read mri 2.4 “unifies” Fixnum and Integer. This, it turns out, is something quite
different from what i though, mostly about which class names are returned.
And that it is ok to have two implementations for the same class, Integer.
%p
But even it wasnt what i thought, it did spark an idea, and i hope a solution to a problem
that i have seen lurking ahead. Strangely the solution maybe even more radical than the
cross function jumps it replaces.
%h2#a-problem-lurking-ahead A problem lurking ahead
%p As i have been thinking more about what happens when a type changes, i noticed something:
%p
An object may change its type in one method (A), but may be used in a method (B), far up the call
stack. How does B know to treat the object different. Specifically, the calls B makes
on the object are determined by the type before the change. So they will be wrong after the change,
and so B needs to know about the type change.
%p
Such a type change was supposed to be handled by a cross method jump, thus fixing the problem
in A. But the propagation to B is cumbersome, there can be just so many of them.
Anything that i though of is quite a bit too involved. And this is before even thinking about closures.
%h2#a-step-back A step back
%p
Looking at this from a little higher vantage there are maybe one too many things i have been trying
to avoid.
%p
The first one was the bit-tagging. The ruby (and smalltalk) way of tagging an integer
with a marker bit. Thus loosing a bit and gaining a gazillion type checks. In mri c land
an object is a VALUE, and a VALUE is either a tagged integer or a pointer to an object struct.
So on
%strong every
operation the bit has to be checked. Both of these ive been trying to avoid.
%p
So that lead to a system with no explicit information in the lowest level representation and
thus a large dance to have that information in an external type system and keeping that type
information up to date.
%p
Off course the elephant in the room here is that i have also be trying to avoid making integers and
floats objects. Ie keeping their c, or machine representation, just like anyone else before me.
Too wasteful to even think otherwise.
%h2#and-a-step-forward And a step forward
%p
The inspiration that came by reading about the unification of integers was exactly that:
%strong to unify integers
\. Unifying with objects, ie
%strong making integers objects
%p
I have been struggling with the dichotomy between integer and objects for a long time. There always
seemed something so fundamentally wrong there. Ok, maybe if the actual hardware would do the tagging
and that continuous checking, then maybe. But otherwise: one is a direct, the other an indirect
value. It just seemed wrong.
%p
Making Integers (and floats etc) first class citizens, objects with a type, resolves the chasm
very nicely. Off course it does so at a price, but i think it will be worth it.
%h2#the-price-of-unification The price of Unification
%p
Initially i wanted to make all objects the size of a cache line or multiples thereof. This is
something ill have to let go of: Integer objects should naturally be 2 words, namely the type
and the actual value.
%p
So this is doubling the amount of ram used to represent integers. But maybe worse, it makes them
subject to garbage collection. Both can probably be alleviated by having the first 256 pinned, ie
a fixed array, but still.
%p
Also using a dedicated memory manager for them and keeping a pool of unused as a linked list
should make it quick. And off course the main hope lies in the fact that your average program
nowadays (especially oo) does not really use integers all that much.
%h2#oo-to-the-rescue OO to the rescue
%p
Off course this is not the first time my thought have strayed that way. There are two reasons why
they quickly scuttled back home to known territory before. The first was the automatic optimization
reflex: why use 2 words for something that can be done in one, and all that gc on top.
%p
But the second was probably even more important: If we then have the value inside the object
(as a sort of instance variable or array element), then when return it then we have the “naked”
integer wreaking havoc in our system, as the code expects objects everywhere.
And if we dont return it, then how do operations happen, since machines only operate on values.
%p
The thing that i had not considered is that that line of thinking is mixing up the levels
of abstraction. It assumes a lower level than one needs: What is needed is that the system
knows about integer objects (in a similar way that the other ways assumes knowledge of integer
values.)
%p
Concretely the “machine”, or compiler, needs to be able to perform the basic Integer operations,
on the Integer objects. This is really not so different from it knowing how to perform the
operations on two values. It just involves getting the actual values from the object and
putting them back.
%p
OO helps in another way that never occurred to me.
%strong Data hiding:
we never actually pass out
the value. The value is private to the object and not accessible from the outside. In fact it not
even accessible from the inside to the object itself. Admittedly this means more functionality in
the compiler, but since that is a solved problem (see builtin), its ok.
%h2#unified-method-caching Unified method caching
%p
So having gained this unification, we can now determine the type of an object very very easily.
The type will
%em always
be the first word of the memory that the object occupies. We dont have
immediate values anymore, so always is always.
%p
This is
%em very
handy, since we have given up being god and thus knowing everything at any time.
In concrete terms this means that in a method, we can
%em not
know what type an object is.
In fact its worse, we cant even say what type it is, even if we have checked it, but after we
have passed it as an argument to another method.
%p
Luckily programs are not random, and it quite rare for an object to change type, and so a given
object will usually have one of a very small set of types. This can be used to do method caching.
Instead of looking up the method statically and calling it unconditionally at run-time, we will
need some kind of lookup at run-time.
%p
The lookup tables can be objects that the method carries. A small table (3 entries) with pairs of
type vs jump address. A little assembler to go through the list and jump, or in case of a miss
jump to some handler that does a real lookup in the type.
%p
In a distant future a smaller version may be created. For the case where the type has been
checked already during the method, a further check may be inlined completely into the code and
only revert to the table in case of a miss. But thats down the road a bit.
%p Next question: How does this work with Parfait. Or the interpreter??

View File

@ -1,117 +0,0 @@
I just read mri 2.4 "unifies" Fixnum and Integer. This, it turns out, is something quite
different from what i though, mostly about which class names are returned.
And that it is ok to have two implementations for the same class, Integer.
But even it wasn't what i thought, it did spark an idea, and i hope a solution to a problem
that i have seen lurking ahead. Strangely the solution maybe even more radical than the
cross function jumps it replaces.
## A problem lurking ahead
As i have been thinking more about what happens when a type changes, i noticed something:
An object may change it's type in one method (A), but may be used in a method (B), far up the call
stack. How does B know to treat the object different. Specifically, the calls B makes
on the object are determined by the type before the change. So they will be wrong after the change,
and so B needs to know about the type change.
Such a type change was supposed to be handled by a cross method jump, thus fixing the problem
in A. But the propagation to B is cumbersome, there can be just so many of them.
Anything that i though of is quite a bit too involved. And this is before even thinking about closures.
## A step back
Looking at this from a little higher vantage there are maybe one too many things i have been trying
to avoid.
The first one was the bit-tagging. The ruby (and smalltalk) way of tagging an integer
with a marker bit. Thus loosing a bit and gaining a gazillion type checks. In mri c land
an object is a VALUE, and a VALUE is either a tagged integer or a pointer to an object struct.
So on **every** operation the bit has to be checked. Both of these i've been trying to avoid.
So that lead to a system with no explicit information in the lowest level representation and
thus a large dance to have that information in an external type system and keeping that type
information up to date.
Off course the elephant in the room here is that i have also be trying to avoid making integers and
floats objects. Ie keeping their c, or machine representation, just like anyone else before me.
Too wasteful to even think otherwise.
## And a step forward
The inspiration that came by reading about the unification of integers was exactly that:
**to unify integers** . Unifying with objects, ie **making integers objects**
I have been struggling with the dichotomy between integer and objects for a long time. There always
seemed something so fundamentally wrong there. Ok, maybe if the actual hardware would do the tagging
and that continuous checking, then maybe. But otherwise: one is a direct, the other an indirect
value. It just seemed wrong.
Making Integers (and floats etc) first class citizens, objects with a type, resolves the chasm
very nicely. Off course it does so at a price, but i think it will be worth it.
## The price of Unification
Initially i wanted to make all objects the size of a cache line or multiples thereof. This is
something i'll have to let go of: Integer objects should naturally be 2 words, namely the type
and the actual value.
So this is doubling the amount of ram used to represent integers. But maybe worse, it makes them
subject to garbage collection. Both can probably be alleviated by having the first 256 pinned, ie
a fixed array, but still.
Also using a dedicated memory manager for them and keeping a pool of unused as a linked list
should make it quick. And off course the main hope lies in the fact that your average program
nowadays (especially oo) does not really use integers all that much.
## OO to the rescue
Off course this is not the first time my thought have strayed that way. There are two reasons why
they quickly scuttled back home to known territory before. The first was the automatic optimization
reflex: why use 2 words for something that can be done in one, and all that gc on top.
But the second was probably even more important: If we then have the value inside the object
(as a sort of instance variable or array element), then when return it then we have the "naked"
integer wreaking havoc in our system, as the code expects objects everywhere.
And if we don't return it, then how do operations happen, since machines only operate on values.
The thing that i had not considered is that that line of thinking is mixing up the levels
of abstraction. It assumes a lower level than one needs: What is needed is that the system
knows about integer objects (in a similar way that the other ways assumes knowledge of integer
values.)
Concretely the "machine", or compiler, needs to be able to perform the basic Integer operations,
on the Integer objects. This is really not so different from it knowing how to perform the
operations on two values. It just involves getting the actual values from the object and
putting them back.
OO helps in another way that never occurred to me. **Data hiding:** we never actually pass out
the value. The value is private to the object and not accessible from the outside. In fact it not
even accessible from the inside to the object itself. Admittedly this means more functionality in
the compiler, but since that is a solved problem (see builtin), it's ok.
## Unified method caching
So having gained this unification, we can now determine the type of an object very very easily.
The type will *always* be the first word of the memory that the object occupies. We don't have
immediate values anymore, so always is always.
This is *very* handy, since we have given up being god and thus knowing everything at any time.
In concrete terms this means that in a method, we can *not* know what type an object is.
In fact it's worse, we can't even say what type it is, even if we have checked it, but after we
have passed it as an argument to another method.
Luckily programs are not random, and it quite rare for an object to change type, and so a given
object will usually have one of a very small set of types. This can be used to do method caching.
Instead of looking up the method statically and calling it unconditionally at run-time, we will
need some kind of lookup at run-time.
The lookup tables can be objects that the method carries. A small table (3 entries) with pairs of
type vs jump address. A little assembler to go through the list and jump, or in case of a miss
jump to some handler that does a real lookup in the type.
In a distant future a smaller version may be created. For the case where the type has been
checked already during the method, a further check may be inlined completely into the code and
only revert to the table in case of a miss. But that's down the road a bit.
Next question: How does this work with Parfait. Or the interpreter??

View File

@ -0,0 +1,88 @@
%p
As i said in the last post, a step back and forward, possibly two, was taken and understanding
grows again. Especially when i think that some way is the way, it always changes and i turn out
to be at least partially wrong. The way of life, of imperfect intelligence, to strive for that
perfection that is forever out of reach. Heres the next installment.
%h2#slopes-and-ramps Slopes and Ramps
%p
When thinking about method caching and how to implement it i came across this thing that i will
call a Slope for now. The Slope of a function that is. At least thats where the thought started.
%p The Slope of a function is a piece of code that has two main properties:
%ul
%li
it is straight, up to the end. i mean it has no branches from the outside.
It may have internally but that does not affect anything.
%li it ends in a branch that returns (a call), but this is not part of the Slope
%p
Those
%em two
properties would better be called a Ramp. The Ramp the function goes along before it
jumps of to the next function.
%p
The
%strong Slope
is the part before the jump. So a Ramp is a Slope and a Jump.
%p
Code in the Slope, it struck me, has the unique possibility of doing a jump, with out worrying about
returning. After all, it knows there is a call coming. After contemplating this a little i
found the flaw, which one understands when thinking about where the function returns to. So Slope
can jump away without caring if (and only if) the return address is set to after that jump (and the
address is actually set by the code before the jump).
%p
Remembering that we set the return address in the caller (not as in c the callee) we can arrange
for that. And so we can write Slope code that just keeps going. Because once the return address
is set up, the code can just keep jumping forward. The only thing is that the call must come.
%p
In more concrete terms: Method caching can be a series of checks and jumps. If the check is ok
we call, otherwise jump on. And even the last fail (the switches default case) can be a jump
to what we would otherwise call a method. A method that determines the real jump target from
the type (of self, in the message) and calls it. Except its not a method because it never
returns, which is symmetrically to us not calling it.
%p
So this kind of “method” which is not really a method, but still a fair bit of logic, ill call
a Slope.
%h2#links-and-chains Links and Chains
%p
A Slope, the story continues, is really just a specific case of something else. If we take away
the expectation that a call is coming, we are left with a sequence of code with jumps to more
code. This could be called a Chain, and each part of the Chain would be a Link.
%p
To define that: a
%strong Link
is sequence of code that ends in a jump. It has no other jumps, just
the one at the end. And the jump at the end jumps to another Link.
%p The Code i am talking about here is risc level code, one could say assembler instructions.
%p
The concept though is very familiar: at a higher level the Link would be a Statement and a
Chain a sequence of Statements. Were missing the branch abstraction yet, but otherwise this is
a lower level description of code in a similar way as the typed level Code and Statements are
a description of higher level code.
%h2#typed-level-is-wrong Typed level is wrong
%p
The level that is nowadays called Typed, and used to be soml, is basically made up of language
constructs. It does not allow for manipulation of the risc level. As the ruby level is translated
to the typed level, which in turn is translated to the risc level, the ruby compiler has no
way of manipulating the risc level. This is as it should be.
%p
The problem is just, that the constructs that are currently at the typed level, do not allow
to express the results needed at the risc level.
%p
Through the history of the development the levels have become mixed up. It is relatively clear at
the ruby level what kind of construct is needed at the risc level. This is what has to drive the
constructs at the typed level. We need access to these kinds of Slope or Link ideas at the ruby
level.
%p
Another way of looking at the typed level inadequacies is the size of the codes generated. Some of
the expressions (or statements) resolve to 2 or 3 risc instructions. Others, like the call, are
15. This is an indication that part of the level is wrong. A good way to architect the layers
would result in an
%em even
expansion of the amount of code at every level.
%h2#too-little-testing Too little testing
%p
The ruby compiler should really drive the development more. The syntax and behavior of ruby are
quite clear, and i feel the risc layer is quite a solid target. So before removing too much or
rewriting too much i shall just add more (and more) functionality to the typed layer.
%p
At the same time some of the concepts (like a method call) will probably not find any use, but
as long as they dont harm, i shall leave them lying around.

View File

@ -1,84 +0,0 @@
As i said in the last post, a step back and forward, possibly two, was taken and understanding
grows again. Especially when i think that some way is the way, it always changes and i turn out
to be at least partially wrong. The way of life, of imperfect intelligence, to strive for that
perfection that is forever out of reach. Here's the next installment.
## Slopes and Ramps
When thinking about method caching and how to implement it i came across this thing that i will
call a Slope for now. The Slope of a function that is. At least that's where the thought started.
The Slope of a function is a piece of code that has two main properties:
- it is straight, up to the end. i mean it has no branches from the outside.
It may have internally but that does not affect anything.
- it ends in a branch that returns (a call), but this is not part of the Slope
Those *two* properties would better be called a Ramp. The Ramp the function goes along before it
jumps of to the next function.
The **Slope** is the part before the jump. So a Ramp is a Slope and a Jump.
Code in the Slope, it struck me, has the unique possibility of doing a jump, with out worrying about
returning. After all, it knows there is a call coming. After contemplating this a little i
found the flaw, which one understands when thinking about where the function returns to. So Slope
can jump away without caring if (and only if) the return address is set to after that jump (and the
address is actually set by the code before the jump).
Remembering that we set the return address in the caller (not as in c the callee) we can arrange
for that. And so we can write Slope code that just keeps going. Because once the return address
is set up, the code can just keep jumping forward. The only thing is that the call must come.
In more concrete terms: Method caching can be a series of checks and jumps. If the check is ok
we call, otherwise jump on. And even the last fail (the switches default case) can be a jump
to what we would otherwise call a method. A method that determines the real jump target from
the type (of self, in the message) and calls it. Except it's not a method because it never
returns, which is symmetrically to us not calling it.
So this kind of "method" which is not really a method, but still a fair bit of logic, i'll call
a Slope.
## Links and Chains
A Slope, the story continues, is really just a specific case of something else. If we take away
the expectation that a call is coming, we are left with a sequence of code with jumps to more
code. This could be called a Chain, and each part of the Chain would be a Link.
To define that: a **Link** is sequence of code that ends in a jump. It has no other jumps, just
the one at the end. And the jump at the end jumps to another Link.
The Code i am talking about here is risc level code, one could say assembler instructions.
The concept though is very familiar: at a higher level the Link would be a Statement and a
Chain a sequence of Statements. We're missing the branch abstraction yet, but otherwise this is
a lower level description of code in a similar way as the typed level Code and Statements are
a description of higher level code.
## Typed level is wrong
The level that is nowadays called Typed, and used to be soml, is basically made up of language
constructs. It does not allow for manipulation of the risc level. As the ruby level is translated
to the typed level, which in turn is translated to the risc level, the ruby compiler has no
way of manipulating the risc level. This is as it should be.
The problem is just, that the constructs that are currently at the typed level, do not allow
to express the results needed at the risc level.
Through the history of the development the levels have become mixed up. It is relatively clear at
the ruby level what kind of construct is needed at the risc level. This is what has to drive the
constructs at the typed level. We need access to these kinds of Slope or Link ideas at the ruby
level.
Another way of looking at the typed level inadequacies is the size of the codes generated. Some of
the expressions (or statements) resolve to 2 or 3 risc instructions. Others, like the call, are
15. This is an indication that part of the level is wrong. A good way to architect the layers
would result in an *even* expansion of the amount of code at every level.
## Too little testing
The ruby compiler should really drive the development more. The syntax and behavior of ruby are
quite clear, and i feel the risc layer is quite a solid target. So before removing too much or
rewriting too much i shall just add more (and more) functionality to the typed layer.
At the same time some of the concepts (like a method call) will probably not find any use, but
as long as they don't harm, i shall leave them lying around.

View File

@ -0,0 +1,90 @@
%p Going on holiday without a computer was great. Forcing me to recap and write things down on paper.
%h2#layers Layers
%p
One of the main results was that the current layers are a bit mixed up and that will have to be
fixed. But first, some of the properties in which i think of the different layers.
%h3#layer-properties Layer properties
%p
%strong Structure of the representation
is one of the main distinction of the layers. We know the parser gives us a
%strong tree
and that the produced binary is a
= succeed "," do
%strong blob
%p
A closely related property of the representation is whether it is
= succeed "." do
%strong abstract or concrete
%p
If we think of the layer as a language, what
%strong Language level
would it be, assembler, c, oo.
Does it have
= succeed "," do
%strong control structures
= succeed "." do
%strong jumps
%h3#ruby-layer Ruby Layer
%p
The top ruby layer is a given, since it is provided by the external gem
= succeed "." do
%em parser
= succeed "." do
%em tree
%p
What might sound self-evident that this layer is very close to ruby, this means that inherits
all of rubys quirks, and all the redundancy that makes ruby a nice language. By quirks i mean
things like the integer 0 being true in an if statement. A good example of redundancy is the
existence of if and until, or the ability to add if after the statement.
%h3#virtual-language Virtual Language
%p
The next layer down, and the first to be defined in ruby-x, is the virtual language layer.
By language i mean object oriented language, and by virtual an non existent minimal version of an
object oriented language. This is like ruby, but without the quirks or redundancy. This is
meant to be compatible with other oo languages, meaning that it should be possible to transform
a python or smalltalk program into this layer.
%p
The layer is represented as a concrete tree and derived from the ast by removing:
\- unless, the ternary operator and post conditionals
\- splats and multi-assignment
\- implicit block passing
\- case statement
\- global variables
%p It should be relatively obvious how these can be replaced by existing constructs (details in code)
%h3#virtual-object-machine Virtual object Machine
%p
The next down represents what we think of as a machine, more than a language, and an object
oriented at that.
%p
A differentiating factor is that a machine has no control structures like a language. Only jumps.
The logical structure is more a stream or array. Something closer to the memory that
i will map to in lower layers. We still use a tree representation for this level, but with the
interpretation that neighboring children get implicitly jumped to.
%p
The machine deals in objects, not in memory as a von Neumann machine would. The machine has
instructions to move data from one object to another. There are no registers, just objects.
Also basic arithmetic and testing is covered by the instruction set.
%h3#risc-layer Risc layer
%p
This layer is a minimal abstraction of an arm processor. Ie there are eight registers, instructions
to and from memory and between registers. Basic integer operations work on registers. So does
testing, and off course there are jumps. While the layer deals in random access memory, it is
aware and uses the object machines objects.
%p
The layer is minimal in the sense that it defines only instructions needed to implement ruby.
Instructions are defined in a concrete manner, ie one class per Instruction, which make the
set of Instructions extensible by other gems.
%p
The structure is a linked list which is manly interested in three types of Instructions. Namely
Jumps, jump targets (Labels), and all other. All the other Instructions a linear in the von Neumann
sense, that the next instruction will be executed implicitly.
%h3#arm-and-elf-layer Arm and elf Layer
%p
The mapping of the risc layer to the arm layer is very straightforward, basically one to one with
the exception of constant loading (which is quirky on the arm 32 bit due to historical reasons).
Arm instructions (being instructions of a real cpu), have the ability to assemble themselves into
binary, which apart from the loading are 4 bytes.
%p The structure of the Arm instruction is the same as the risc layer, a linked list.
%p
There is also code to assemble the objects, and with the instruction stream make a binary elf
executable. While elf support is minimal, the executable does execute on rasperry pi or qemu.

View File

@ -1,88 +0,0 @@
Going on holiday without a computer was great. Forcing me to recap and write things down on paper.
## Layers
One of the main results was that the current layers are a bit mixed up and that will have to be
fixed. But first, some of the properties in which i think of the different layers.
### Layer properties
**Structure of the representation** is one of the main distinction of the layers. We know the parser gives us a **tree** and that the produced binary is a **blob**, but what in between. As options we would still have graphs and lists.
A closely related property of the representation is whether it is **abstract or concrete**.
An abstract representation is represented as a single class in ruby and it's properties are
accessible through an abstract interface, like a hash. A concrete representation would use
a class per type, have properties available as ruby attributes and thus allow functions on the
class.
If we think of the layer as a language, what **Language level** would it be, assembler, c, oo.
Does it have **control structures**, or **jumps**.
### Ruby Layer
The top ruby layer is a given, since it is provided by the external gem *parser*.
Parser outputs an abstract syntax tree (AST), so it is a *tree*. Also it is abstract, thus
represented by a single ruby class, which carries a type as an attribute.
What might sound self-evident that this layer is very close to ruby, this means that inherits
all of ruby's quirks, and all the redundancy that makes ruby a nice language. By quirks i mean
things like the integer 0 being true in an if statement. A good example of redundancy is the
existence of if and until, or the ability to add if after the statement.
### Virtual Language
The next layer down, and the first to be defined in ruby-x, is the virtual language layer.
By language i mean object oriented language, and by virtual an non existent minimal version of an
object oriented language. This is like ruby, but without the quirks or redundancy. This is
meant to be compatible with other oo languages, meaning that it should be possible to transform
a python or smalltalk program into this layer.
The layer is represented as a concrete tree and derived from the ast by removing:
- unless, the ternary operator and post conditionals
- splats and multi-assignment
- implicit block passing
- case statement
- global variables
It should be relatively obvious how these can be replaced by existing constructs (details in code)
### Virtual object Machine
The next down represents what we think of as a machine, more than a language, and an object
oriented at that.
A differentiating factor is that a machine has no control structures like a language. Only jumps.
The logical structure is more a stream or array. Something closer to the memory that
i will map to in lower layers. We still use a tree representation for this level, but with the
interpretation that neighboring children get implicitly jumped to.
The machine deals in objects, not in memory as a von Neumann machine would. The machine has
instructions to move data from one object to another. There are no registers, just objects.
Also basic arithmetic and testing is covered by the instruction set.
### Risc layer
This layer is a minimal abstraction of an arm processor. Ie there are eight registers, instructions
to and from memory and between registers. Basic integer operations work on registers. So does
testing, and off course there are jumps. While the layer deals in random access memory, it is
aware and uses the object machines objects.
The layer is minimal in the sense that it defines only instructions needed to implement ruby.
Instructions are defined in a concrete manner, ie one class per Instruction, which make the
set of Instructions extensible by other gems.
The structure is a linked list which is manly interested in three types of Instructions. Namely
Jumps, jump targets (Labels), and all other. All the other Instructions a linear in the von Neumann
sense, that the next instruction will be executed implicitly.
### Arm and elf Layer
The mapping of the risc layer to the arm layer is very straightforward, basically one to one with
the exception of constant loading (which is quirky on the arm 32 bit due to historical reasons).
Arm instructions (being instructions of a real cpu), have the ability to assemble themselves into
binary, which apart from the loading are 4 bytes.
The structure of the Arm instruction is the same as the risc layer, a linked list.
There is also code to assemble the objects, and with the instruction stream make a binary elf
executable. While elf support is minimal, the executable does execute on rasperry pi or qemu.

View File

@ -0,0 +1,79 @@
%p Method caching can be done at language level. Wow. But first some boring news:
%h2#vool-is-ready-mom-is-coming Vool is ready, Mom is coming
%p
The
= succeed "irtual" do
%strong V
= succeed "bject" do
%strong O
= succeed "riented" do
%strong O
= succeed "anguage" do
%strong L
%p
Vool will not reflect some of rubys more advanced features, like splats or implicit blocks,
and hopes to make the conditional logic more consistent.
%p
The
= succeed "inimal" do
%strong M
= succeed "bject" do
%strong O
= succeed "achine" do
%strong M
%h2#inline-method-caching Inline Method caching
%p
In ruby almost all work is actually done by method calling and an interpreter spends much of its
time looking up methods to call. The obvious thing to do is to cache the result, and this has
been the plan for a while.
%p
Off course for caching to work, one needs a cache key and invalidation strategy, both of which
are handled by the static types, which ill review below.
%h3#small-cache Small cache
%p
Aaron Patterson has done
%a{:href => "https://www.youtube.com/watch?v=b77V0rkr5rk"} research into method caching
in mri and found that most call sites (>99%) only need one cache entry.
%p
This means a single small object can carry the information needed, probably type, function address
and counter, times two.
%p
In rubyx this can literally be an object that we attach to the CallSite, either prefill if possible
or leave to be used at runtime.
%h3#method-lookup-is-a-static-function Method lookup is a static function
%p
The other important idea here is that the actual lookup of a method is a know function. Known at
compile time that is.
%p
Thus dynamic dispatch can be substituted by a cache lookup, and a static call. The result of the call
can/should update the cache and then we can start with the lookup again.
%p
This makes it possible to remove dynamic dispatch from the code, actually at code level.
I had previously though of implementing the send at a lower level, but see now that it would
be quite possible to do it at the language level with an if and a call, possible another call
for the miss. That would drop the language down from dynamic (4th level) to static (3rd level).
%p I am still somewhat at odds whether to actually do this or leave it for the machine level (mom).
%h2#static-type-review Static Type review
%p
To make the caching possible, the cache key - value association has to be constant.
Off course in oo systems the class of an object is constant and so we could just use that.
But in ruby you can change the class, add instance variables or add/remove/change methods,
and so the class as a key and the method as value is not correct over time.
%p
In rubyx, an object has a type, and its type can change. But a type can never change. A type refers
to the class that it represented at the time of creation. Conversely a class carries an instance
type, which is the type of new instances that get created. But when variables or methods are added
or removed from the class, a new type is created. Type instances never change. Method implementations
are attached to types, and once compiled, never changed either.
%p
Thus using the objects type as cache key and the method as its value will stay correct over time.
And the double bonus of this is that it takes care of both objects of different classes (as those will have different type for sure), but also objects of the same class, at different times, when
eg a method with the same name has been added. Those objects will have different type too, and
thus experience a cache miss and have their correct method found.
%h2#up-next Up next
%p
More grunt-work. Now that Vool replaces the ast the code from rubyx/passes has to be “ported” to use it. That means:
\- class extraction and class object creation
\- method extraction and creation
\- type creation by ivar analysis
\- frame creation by local variable analysis

View File

@ -1,77 +0,0 @@
Method caching can be done at language level. Wow. But first some boring news:
## Vool is ready, Mom is coming
The **V**irtual **O**bject **O**riented **L**anguage level, as envisioned in the previous post,
is done. Vool is meant to be a language agnostic layer, and is typed, unlike the ast that
the ruby parser outputs. This will allow to write more oo code, by putting code into the
statement classes, rather than using the visitor pattern. I tend to agree with CodeClimate on
the fact that the visitor pattern produces bad code.
Vool will not reflect some of ruby's more advanced features, like splats or implicit blocks,
and hopes to make the conditional logic more consistent.
The **M**inimal **O**bject **M**achine will be the next layer. It will sit between Vool and Risc
as an object version of the Risc machine. This is mainly to make it more understandable, as i
noticed that part of the Risc, especially calling, is getting quite complex. But more on that next..
## Inline Method caching
In ruby almost all work is actually done by method calling and an interpreter spends much of it's
time looking up methods to call. The obvious thing to do is to cache the result, and this has
been the plan for a while.
Off course for caching to work, one needs a cache key and invalidation strategy, both of which
are handled by the static types, which i'll review below.
### Small cache
Aaron Patterson has done [research into method caching](https://www.youtube.com/watch?v=b77V0rkr5rk)
in mri and found that most call sites (>99%) only need one cache entry.
This means a single small object can carry the information needed, probably type, function address
and counter, times two.
In rubyx this can literally be an object that we attach to the CallSite, either prefill if possible
or leave to be used at runtime.
### Method lookup is a static function
The other important idea here is that the actual lookup of a method is a know function. Known at
compile time that is.
Thus dynamic dispatch can be substituted by a cache lookup, and a static call. The result of the call
can/should update the cache and then we can start with the lookup again.
This makes it possible to remove dynamic dispatch from the code, actually at code level.
I had previously though of implementing the send at a lower level, but see now that it would
be quite possible to do it at the language level with an if and a call, possible another call
for the miss. That would drop the language down from dynamic (4th level) to static (3rd level).
I am still somewhat at odds whether to actually do this or leave it for the machine level (mom).
## Static Type review
To make the caching possible, the cache key - value association has to be constant.
Off course in oo systems the class of an object is constant and so we could just use that.
But in ruby you can change the class, add instance variables or add/remove/change methods,
and so the class as a key and the method as value is not correct over time.
In rubyx, an object has a type, and it's type can change. But a type can never change. A type refers
to the class that it represented at the time of creation. Conversely a class carries an instance
type, which is the type of new instances that get created. But when variables or methods are added
or removed from the class, a new type is created. Type instances never change. Method implementations
are attached to types, and once compiled, never changed either.
Thus using the object's type as cache key and the method as it's value will stay correct over time.
And the double bonus of this is that it takes care of both objects of different classes (as those will have different type for sure), but also objects of the same class, at different times, when
eg a method with the same name has been added. Those objects will have different type too, and
thus experience a cache miss and have their correct method found.
## Up next
More grunt-work. Now that Vool replaces the ast the code from rubyx/passes has to be "ported" to use it. That means:
- class extraction and class object creation
- method extraction and creation
- type creation by ivar analysis
- frame creation by local variable analysis

View File

@ -0,0 +1,91 @@
%p
While work on Mom (Minimal object machine) continues, i can see the futures a little clearer.
Alas, for now the shortest route is best, so the future will have to wait. But here is what im
thinking.
%h2#types-today Types today
%p
The
%a{:href => "/rubyx/layers.html"} architecture
document outlines this in more detail, but in short:
\- types are immutable
\- every object has a type (which may change)
\- a type implements the interface of a class at a given time
\- a type is defined by a list of attribute names
%p
%img{:alt => "Types diagram", :src => "/assets/types.jpg"}/
%h3#how-classes-work How classes work
%p
So the interesting thing here is how the classes work. Seeing as they are open, attributes can
be added and removed, but the types are immutable.
%p The solution is easy: when a new attribute is added to a class, a new type is created.
%p
The
%em instance type
is then updated to point to the current type. This means that new objects will
be created with the new type, and old ones will keep their old type. Until the attribute is
added to them too, in which case their
%em type
is updated too.
%p
%strong Methods
btw are stored at the Type, as they encode the knowledge of the memory layout
that comes with the type, into the code of the method. Remember: full data hiding, only objects
methods can access the variables, hence the type needs to be know only for
= succeed "." do
%em self
%h2#the-future-of-types The future of types
%p
But what i wanted to talk about is how this picture is going to change in the future.
To understand why we might want to, lets look at method dispatch on an instance variable.
%p
When you write something like @me.length , the compiler can check that @me is indeed an instance variable by checking the type of self. But since not information is stored about the type of
%em me
, a dynamic dispatch is needed to call
= succeed "." do
%em length
%p
The simple idea is to get rid of this dynamic dispatch by storing the type of instance variables
too. This makes a lot calls faster, but it does come at significant cost:
\- every assignment to the variable has to be checked for type.
\- many more types must be created to differentiate the variables by name
%strong and
type.
%p
Both of those dont maybe sound soo bad at first, but its the cumulative effects that make a
difference. Instance assignment is one of the only two ways to move data around in a oo machine.
Thats a lot of checking. And Types hold the methods, so for every new type
%em all
methods have
to be
%em a
stored, and
%em b
created/compiled .
%p But off course the biggest thing is all the coding this entails. So thats why its in the future :-)
%h2#multilayered-mom Multilayered Mom
%p
Just a note on Mom: this was meant to be a bridge between the language layer (vool) and the machine
layer (risc). This step, from tree and statements, to list and low level instructions was deemed
to big, so the abstract Minimal Object Machine is supposed to be a layer in between those.
And it is off course.
%p
What i didnt fully appreciate before starting was that the two things are related. I mean
statements lend themselves to a tree, while having instruction in a tree is kind of silly.
Similarly statements in a list doesnt really make sense either. So it ended up being a two step
process inside Mom.
%p
The
%em first
pass that transforms vool, keeps the tree structure. But it does introduce Moms own
instructions. It turns out that this is sensible for exactly the linear parts of code.
%p
The
%em second
pass flattens the remaining control structures into jumps and labels. The result
maps to the risc layer 1 to n, meaning every Mom instruction simple expands into one or usually
more risc instructions.
%p
In the future i envision that this intermediate representation at the Mom level will be a
good place for further optimisations, but we shall see. At least the code is still recognisable,
meaning relatively easy to reason about. This is a property that the risc layer really does
not have anymore.

View File

@ -1,72 +0,0 @@
While work on Mom (Minimal object machine) continues, i can see the futures a little clearer.
Alas, for now the shortest route is best, so the future will have to wait. But here is what i'm
thinking.
## Types today
The [architecture](/rubyx/layers.html) document outlines this in more detail, but in short:
- types are immutable
- every object has a type (which may change)
- a type implements the interface of a class at a given time
- a type is defined by a list of attribute names
![Types diagram](/assets/types.jpg)
### How classes work
So the interesting thing here is how the classes work. Seeing as they are open, attributes can
be added and removed, but the types are immutable.
The solution is easy: when a new attribute is added to a class, a new type is created.
The *instance type* is then updated to point to the current type. This means that new objects will
be created with the new type, and old ones will keep their old type. Until the attribute is
added to them too, in which case their *type* is updated too.
**Methods** btw are stored at the Type, as they encode the knowledge of the memory layout
that comes with the type, into the code of the method. Remember: full data hiding, only objects
methods can access the variables, hence the type needs to be know only for *self*.
## The future of types
But what i wanted to talk about is how this picture is going to change in the future.
To understand why we might want to, let's look at method dispatch on an instance variable.
When you write something like @me.length , the compiler can check that @me is indeed an instance variable by checking the type of self. But since not information is stored about the type of
*me* , a dynamic dispatch is needed to call *length*.
The simple idea is to get rid of this dynamic dispatch by storing the type of instance variables
too. This makes a lot calls faster, but it does come at significant cost:
- every assignment to the variable has to be checked for type.
- many more types must be created to differentiate the variables by name **and** type.
Both of those don't maybe sound soo bad at first, but it's the cumulative effects that make a
difference. Instance assignment is one of the only two ways to move data around in a oo machine.
That's a lot of checking. And Types hold the methods, so for every new type *all* methods have
to be *a* stored, and *b* created/compiled .
But off course the biggest thing is all the coding this entails. So that's why it's in the future :-)
## Multilayered Mom
Just a note on Mom: this was meant to be a bridge between the language layer (vool) and the machine
layer (risc). This step, from tree and statements, to list and low level instructions was deemed
to big, so the abstract Minimal Object Machine is supposed to be a layer in between those.
And it is off course.
What i didn't fully appreciate before starting was that the two things are related. I mean
statements lend themselves to a tree, while having instruction in a tree is kind of silly.
Similarly statements in a list doesn't really make sense either. So it ended up being a two step
process inside Mom.
The *first* pass that transforms vool, keeps the tree structure. But it does introduce Mom's own
instructions. It turns out that this is sensible for exactly the linear parts of code.
The *second* pass flattens the remaining control structures into jumps and labels. The result
maps to the risc layer 1 to n, meaning every Mom instruction simple expands into one or usually
more risc instructions.
In the future i envision that this intermediate representation at the Mom level will be a
good place for further optimisations, but we shall see. At least the code is still recognisable,
meaning relatively easy to reason about. This is a property that the risc layer really does
not have anymore.

View File

@ -0,0 +1,99 @@
%p Since i currently have no time to do actual work, ive been doing some research.
%p
Reading about other implementations, especially transpiling ones. Opal, ruby to
javascript, and jruby, ruby to java, or jvm instructions.
%h2#reconsidering-the-madness Reconsidering the madness
%p
One needs to keep an open mind off course. “Reinventing” the wheel is not good, they
say. Off course we dont invent any wheels in IT, we just like the way that sounds,
but even building a wheel, when you can buy one, is bad enough.
And off course i have looked at using other peoples code from the beginning.
%p
A special eye went towards the go language this time. Go has a built in assembler, i
didnt know that. Sure compilers use assembler stages, but the thing about gos
spin on it is that it is quite close to what i call the risc layer. Ie it is machine
independent and abstracts many of
%em real
assemblers quirks away. And also go does
not expose the full assembler spectrum , so there are ways to write assembler within
go. All very promising.
%p
Go has closures, also very nice, and what they call escape analysis. Meaning that while
normally go will use the stack for locals, it has checks for closures and moves
variables to the heap if need be.
%p
So many goodies. And then there is the runtime and all that code that exists already,
so the std lib would be a straight pass through, much like mri. On top one of the best
gcs ive heard about, tooling, lots of code, interoperability and a community.
%p
The price is off course that one (me) would have to become an expert in go. Not too
bad, but still. As a preference i naturally tend to ruby, but maybe one can devise
a way to automate the bridge somewhat. Already found a gem to make extensions in go.
%p
And, while looking, there seems to be one or two ruby in go projects already out there.
Unfortunately interpreters :-(
%h2#sort-of-dealbreaker Sort of dealbreaker
%p
Looking deeper into transpiling and using the go runtime i read about the type system.
Its a good type system i think, and go even provides reflection. So it would be
nice to use it. This would provide good interoperability with go and use the existing
facilities.
%p
Just to scrape the alternative: One could use arrays as the basic structure to build
objects. Much in the same way MRI does. This would mean
%em not
using the type system,
but instead building one. Thinking of the wheels … no, no go.
%p
So a go type for each of what we currently have as Type. Since the current system
is built around immutable types, this seems a good match. The only glitch is that,
eg when adding an instance or method to an existing object, the type of that object
would have to change. A glitch, nothing more, just breaking the one constant static
languages are built on. But digging deep into the go code, i am relatively
certain one could deal with that.
%p
Digging deeper i read more about the go interfaces. I really cant see a way to have
%em only
specific (typed) methods or instances. I mean the current type model is about
types names and the number of slots, not typing every slot, as go. Or for methods,
the idea is to have a name and a certain amount of arguments, and specific implementations for each type of self. Not a separate implementation for each possible combination of types. This means using gos interfaces for variables and methods.
%p
And here it comes: When using the reflect package to ensure the type safety at runtime,
go is really slow.
10+
%a{:href => "http://blog.burntsushi.net/type-parametric-functions-golang/"} times slower
maybe. Im guessing it is not really their priority.
%p
Also, from an architecture kind of viewpoint, having all those interfaces doesnt seem
good. Many small objects, basically one interface object for every object
in the system, just adds lots of load. Unnecessary, ugly.
%h2#the-conclusion The conclusion
%p I just read about a go proposal to have int overflow panic. Too good.
%p
But in the end, ive decided to let go go. In some ways it would seem transpiling
to C would be much easier. Use the array, bake our types, bend those pointers.
While go is definitely the much better language for working in, for transpiling into
it seems to put up more hurdles than provide help.
%p
Having considered this, i can understand rubiniuss choice of c++ much better.
The object model fits well. Given just a single slot for dynamic expansion one
could make that work. One would just have to use the c++ classes as types, not as ruby
classes. Classes are not types, not when you can modify them!
%p But at the end it is not even about which code youre writing, how good the fit.
%p
It is about design, about change. To make this work (this meaning compiling a dynamic language to binary), flexibility is the key. Its not done, much is unclear, and one
must be able to change and change quickly.
%p
Self change, just like in life, is the only real form of control. To maximise that
i didnt use metasm or llvm, and it is also the reason go will not feature in this
project. At the risk of never actually getting there, or having no users. Something
Sinatra sang comes to mind, about doing it a specific way :-)
%p
There is still a lot to be learnt from go though, as much from the language as the
project. I find it inspiring that they moved from a c to a go compiler in a minor
version. And that what must be a major language in google has less commits than
rails. It does give hope.
%p
PPS: Also revisited llvm (too complicated) and crystal (too complicated, bad fit in
type system) after this. Could still do rust off course, but the more i write, the
more i hear the call of simplicity (something that a normal person can still understand)

View File

@ -1,98 +0,0 @@
Since i currently have no time to do actual work, i've been doing some research.
Reading about other implementations, especially transpiling ones. Opal, ruby to
javascript, and jruby, ruby to java, or jvm instructions.
## Reconsidering the madness
One needs to keep an open mind off course. "Reinventing" the wheel is not good, they
say. Off course we don't invent any wheels in IT, we just like the way that sounds,
but even building a wheel, when you can buy one, is bad enough.
And off course i have looked at using other peoples code from the beginning.
A special eye went towards the go language this time. Go has a built in assembler, i
didn't know that. Sure compilers use assembler stages, but the thing about go's
spin on it is that it is quite close to what i call the risc layer. Ie it is machine
independent and abstracts many of *real* assemblers quirks away. And also go does
not expose the full assembler spectrum , so there are ways to write assembler within
go. All very promising.
Go has closures, also very nice, and what they call escape analysis. Meaning that while
normally go will use the stack for locals, it has checks for closures and moves
variables to the heap if need be.
So many goodies. And then there is the runtime and all that code that exists already,
so the std lib would be a straight pass through, much like mri. On top one of the best
gc's i've heard about, tooling, lot's of code, interoperability and a community.
The price is off course that one (me) would have to become an expert in go. Not too
bad, but still. As a preference i naturally tend to ruby, but maybe one can devise
a way to automate the bridge somewhat. Already found a gem to make extensions in go.
And, while looking, there seems to be one or two ruby in go projects already out there.
Unfortunately interpreters :-(
## Sort of dealbreaker
Looking deeper into transpiling and using the go runtime i read about the type system.
It's a good type system i think, and go even provides reflection. So it would be
nice to use it. This would provide good interoperability with go and use the existing
facilities.
Just to scrape the alternative: One could use arrays as the basic structure to build
objects. Much in the same way MRI does. This would mean *not* using the type system,
but instead building one. Thinking of the wheels ... no, no go.
So a go type for each of what we currently have as Type. Since the current system
is built around immutable types, this seems a good match. The only glitch is that,
eg when adding an instance or method to an existing object, the type of that object
would have to change. A glitch, nothing more, just breaking the one constant static
languages are built on. But digging deep into the go code, i am relatively
certain one could deal with that.
Digging deeper i read more about the go interfaces. I really can't see a way to have
*only* specific (typed) methods or instances. I mean the current type model is about
types names and the number of slots, not typing every slot, as go. Or for methods,
the idea is to have a name and a certain amount of arguments, and specific implementations for each type of self. Not a separate implementation for each possible combination of types. This means using go's interfaces for variables and methods.
And here it comes: When using the reflect package to ensure the type safety at runtime,
go is really slow.
10+ [times slower](http://blog.burntsushi.net/type-parametric-functions-golang/)
maybe. I'm guessing it is not really their priority.
Also, from an architecture kind of viewpoint, having all those interfaces doesn't seem
good. Many small objects, basically one interface object for every object
in the system, just adds lots of load. Unnecessary, ugly.
## The conclusion
I just read about a go proposal to have int overflow panic. Too good.
But in the end, i've decided to let go go. In some ways it would seem transpiling
to C would be much easier. Use the array, bake our types, bend those pointers.
While go is definitely the much better language for working in, for transpiling into
it seems to put up more hurdles than provide help.
Having considered this, i can understand rubinius's choice of c++ much better.
The object model fits well. Given just a single slot for dynamic expansion one
could make that work. One would just have to use the c++ classes as types, not as ruby
classes. Classes are not types, not when you can modify them!
But at the end it is not even about which code you're writing, how good the fit.
It is about design, about change. To make this work (this meaning compiling a dynamic language to binary), flexibility is the key. It's not done, much is unclear, and one
must be able to change and change quickly.
Self change, just like in life, is the only real form of control. To maximise that
i didn't use metasm or llvm, and it is also the reason go will not feature in this
project. At the risk of never actually getting there, or having no users. Something
Sinatra sang comes to mind, about doing it a specific way :-)
There is still a lot to be learnt from go though, as much from the language as the
project. I find it inspiring that they moved from a c to a go compiler in a minor
version. And that what must be a major language in google has less commits than
rails. It does give hope.
PPS: Also revisited llvm (too complicated) and crystal (too complicated, bad fit in
type system) after this. Could still do rust off course, but the more i write, the
more i hear the call of simplicity (something that a normal person can still understand)

View File

@ -0,0 +1,116 @@
%p
Now that i
%em have
had time to write some more code (250 commits last month), here is
the good news:
%h2#sending-is-done Sending is done
%p
A dynamic language like ruby really has at its heart the dynamic method resolution. Without
that wed be writing C++. Not much can be done in ruby without looking up methods.
%p
Yet all this time i have been running circles around this mother of a problem, because
(after all) it is a BIG one. It must be the one single most important reason why dynamic
languages are interpreted and not compiled.
%h2#a-brief-recap A brief recap
%p
Last year already i started on a rewrite. After hitting this exact same wall for the fourth
time. I put in some more Layers, the way a good programmer fixes any daunting problem.
%p
The
%a{:href => "https://github.com/ruby-x/rubyx"} Readme
has quite a good summary on the new layers,
and off course ill update the architecture soon. But in case you didnt click, here is the
very very short summary:
%ul
%li
%p
Vool is a Virtual Object Oriented Language. Virtual in that is has no own syntax. But
it has semantics, and those are substantially simpler than ruby. Vool is Ruby without
the fluff.
%li
%p
Mom, the Minimal Object Machine layer is the first machine layer. Mom has no concept of memory
yet, only objects. Data is transferred directly from object
to object with one of Moms main instructions, the SlotLoad.
%li
%p
Risc layer here abstracts the Arm in a minimal and independent way. It does not model
any real RISC cpu instruction set, but rather implements what is needed for rubyx.
%li
%p
There is a minimal
%em Arm
translator that transforms Risc instructions to Arm instructions.
Arm instructions assemble themselves into binary code. A minimal
%em Elf
implementation is
able to create executable binaries from the assembled code and Parfait objects.
%li
%p
Parfait: Generating code (by descending above layers) is only half the story in an oo system.
The other half is classes, types, constant objects and a minimal run-time. This is
what is Parfait is.
%h2#compiling-and-building Compiling and building
%p
After having finished all this layering work, i was back to square
= succeed ":" do
%em resolve
%p
But off course when i got there i started thinking that the resolve method (in ruby)
would need resolve itself. And after briefly considering cheating (hardcoding type
information into this
%em one
method), i opted to write the code in Risc. Basically assembler.
%p
And it was horrible. It worked, but it was completely unreadable. So then i wrote a dsl for
generating risc instructions, using a combination of method_missing, instance_eval and
operator overloading. The result is quite readable code, a mixture between assembler and
a mathematical notation, where one can just freely name registers and move data around
with
%em []
and
= succeed "." do
%em «
%p
By then resolving worked, but it was still a method. Since it was already in risc, i basically
inlined the code by creating a new Mom instruction and moving the code to its
= succeed "." do
%em to_risc
%p
A small bug in calling the resulting method was fixed, and
= succeed "," do
%em voila
%h2#the-proof The proof
%p
Previous, static, Hello Worlds looked like this:
\> “Hello world”.putstring
%p
Off course we can know the type that putstring applies to and so this does not
involve any method resolution at runtime, only at compile time.
%p
Todays step is thus:
\> a = “Hello World”
%blockquote
%p a.putstring
%p
This does involve a run-time lookup of the
%em putstring
method. It being a method on String,
it is indeed found and called.(1) Hurray.
%p
And maths works too:
\> a = 150
%blockquote
%p a.div10
%p
Does indeed result in 15. Even with the
%em new
integers. Part of the rewrite was to upgrade
integers to first class objects.
%p
PS(1): I know with more analysis the compiler
%em could
now that
%em a
is a String (or Integer),
but just now it doesnt. Take my word for it or even better, read the code.

View File

@ -1,90 +0,0 @@
Now that i *have* had time to write some more code (250 commits last month), here is
the good news:
## Sending is done
A dynamic language like ruby really has at it's heart the dynamic method resolution. Without
that we'd be writing C++. Not much can be done in ruby without looking up methods.
Yet all this time i have been running circles around this mother of a problem, because
(after all) it is a BIG one. It must be the one single most important reason why dynamic
languages are interpreted and not compiled.
## A brief recap
Last year already i started on a rewrite. After hitting this exact same wall for the fourth
time. I put in some more Layers, the way a good programmer fixes any daunting problem.
The [Readme](https://github.com/ruby-x/rubyx) has quite a good summary on the new layers,
and off course i'll update the architecture soon. But in case you didn't click, here is the
very very short summary:
- Vool is a Virtual Object Oriented Language. Virtual in that is has no own syntax. But
it has semantics, and those are substantially simpler than ruby. Vool is Ruby without
the fluff.
- Mom, the Minimal Object Machine layer is the first machine layer. Mom has no concept of memory
yet, only objects. Data is transferred directly from object
to object with one of Mom's main instructions, the SlotLoad.
- Risc layer here abstracts the Arm in a minimal and independent way. It does not model
any real RISC cpu instruction set, but rather implements what is needed for rubyx.
- There is a minimal *Arm* translator that transforms Risc instructions to Arm instructions.
Arm instructions assemble themselves into binary code. A minimal *Elf* implementation is
able to create executable binaries from the assembled code and Parfait objects.
- Parfait: Generating code (by descending above layers) is only half the story in an oo system.
The other half is classes, types, constant objects and a minimal run-time. This is
what is Parfait is.
## Compiling and building
After having finished all this layering work, i was back to square *resolve*: how to
dynamically, at run-time, resolve a method to binary. The strategy was going to be to have
some short risc based check and bail out to a method.
But off course when i got there i started thinking that the resolve method (in ruby)
would need resolve itself. And after briefly considering cheating (hardcoding type
information into this *one* method), i opted to write the code in Risc. Basically assembler.
And it was horrible. It worked, but it was completely unreadable. So then i wrote a dsl for
generating risc instructions, using a combination of method_missing, instance_eval and
operator overloading. The result is quite readable code, a mixture between assembler and
a mathematical notation, where one can just freely name registers and move data around
with *[]* and *<<*.
By then resolving worked, but it was still a method. Since it was already in risc, i basically
inlined the code by creating a new Mom instruction and moving the code to it's *to_risc*.
Now resolving still worked, and also looked good.
A small bug in calling the resulting method was fixed, and *voila*, ruby-x can dynamically call
any method.
## The proof
Previous, static, Hello Worlds looked like this:
> "Hello world".putstring
Off course we can know the type that putstring applies to and so this does not
involve any method resolution at runtime, only at compile time.
Todays step is thus:
> a = "Hello World"
> a.putstring
This does involve a run-time lookup of the *putstring* method. It being a method on String,
it is indeed found and called.(1) Hurray.
And maths works too:
> a = 150
> a.div10
Does indeed result in 15. Even with the *new* integers. Part of the rewrite was to upgrade
integers to first class objects.
PS(1): I know with more analysis the compiler *could* now that *a* is a String (or Integer),
but just now it doesn't. Take my word for it or even better, read the code.