ruby-x.github.io/_posts/2015-09-03-a-new-language.md

147 lines
8.0 KiB
Markdown
Raw Normal View History

2015-09-05 19:59:37 +02:00
---
layout: news
author: Torsten
---
It is the **one** thing i said i wasn't going to do: Write a language.
There are too many languages out there already, and just because i want to write a vm,
doesn't mean i want to add to the language jungle.
**But** ...
## The gap
As it happens in life, which is why they say never to say never, it happens just like it
i didn't want. It turns out the semantic gap of what i have is too large.
There is the **register level** , which is approximately assembler, and there is the **vm level**
which is more or less the ruby level. So my head hurts from trying to implement ruby in assembler,
no wonder.
Having run into this wall, which btw is the same wall that crystal ran into, one can see the sense
in what others have done more clearly: Why rubinus uses c++ underneath. Why crystal does not
implement ruby, but a statically typed language. And ultimately why there is no ruby compiler.
The gap is just too large to bridge.
## The need for a language
As I have the architecture of passes, i was hoping to get by with just another layer in the
architecture. A tried an tested approach after all. And while i won't say that that isn't a
possibility, i just don't see it. I think it may be one of those where hindsight will be perfect.
I can see as far as this: If i implement a language, that will mean a parser, ast and compiler.
The target will be my register layer. So a reasonable step up is a sort of object c, that has
basic integer maths and object access. I'll detail that more below, but the point is, if i have
that, i can start writing a vm implementation in that language.
Off course the vm implementation involves a parser, an ast and a compiler, unless we go to the free
compilers (see below). And so implementing the vm in a new language is in essence swapping nodes of
the higher level tree with nodes of the lower level (c-ish) one. Ie parsing should not strictly
speaking be necessary. This node swapping is after all what the pass architecture was designed
to do. But, as i said, i just can't see that happening (yet?).
### Trees vs. Blocks
Speaking of the Pass architecture: I flopped. Well, maybe not so much with the actual Passes, but
with the Method representation. Blocks holding Instructions, and being in essence a list.
Misinformed copying from llvm, misinformed by the final outcome. Off course the final binary
has a linear address space, but that is where the linearity ends. The natural structure of code
is a tree, not a list, as demonstrated by the parse *tree*. Flattening it just creates navigational
problems. Also as a metal model it is easier, as it is easy to imagine swapping out subtrees,
expanding or collapsing nodes etc.
## Saml - SalamA Machine Language
### Typed
Quite a while before cristalizing into the idea of a new language, i already saw the need for a type
system. Off course, and this dates back to the first memory layouts. But i mean the need for a
*strong typing* system, or maybe it's even clearer to call it compile time typing. The type that c
and c++ have. It is essential (mentally, this is off course all for the programmer, not the computer)
to be able to thing in a static type system, and then extend that and make it dynamic.
Or possibly use it in a dynamic way.
This is a good example of this too big gap, where one just steps on quicksand if everything is
all the time dynamic.
The way i had the implementation figured was to have different versions of the same function. In
each function we would have compile time types, everything known. I'll probably still do that,
just written in saml.
### Object c
The language needs to be object based, off course. Just because it's typed and not dynamic
and closer to assembler, doesn't mean we need to give up objects. In fact we mustn't. Saml (working
name) should be a little bit in like c++, ie compile time known variable arrangement and types,
objects. But no classes (or inheritance), more like structs, with full access to everything.
So a struct.variable syntax would mean grab that variable at that address, no functions, no possible
override, just get it. This is actually already implemented as i needed it for the slot access.
So objects without encapsulation or classes. A lower level object orientation.
### Citrus (or treetop) and whitequark
This new approach (and more experience) shed a new light on ruby parsing. The previous idea was to
start small, write the necessary stuff in the parsable subset and with time expand that set.
Alas . . ruby is a beast to parse, and because of the **semantic gap** writing the system,
even in a subset, is not viable. And it turns out the brave warriors of the ruby community have
already produced a pure, production ready, [ruby parser](https://github.com/whitequark/parser).
That can obviously read itself and anything else, so the start small approach is doubly out.
Also, when writing the debugger, i found that parslet is not opal compatible and that doesn't seem
to be changing. So, casting the net, i found Citrus which is small and clean without *any* runtime
dependency (a great feat). Citrus has a grammar, and i find at least it looks nicer than the ruby
grammar code. So for saml it will probably be that and as small a syntax as i can get away with.
### Interoperability
The system code needs to be callable from the higher level, and possibly the other way around.
This probably means the same or compatible calling mechanism and data model. The data model is
quite simple as the at the system level all is just machine words, but in object sized
packets. As for the calling it will probably mean that the same message object needs to be used
and what is now called calling at the machine level is supported. Sending off course won't be.
### Still missing a piece
How the level below calling can be represented is still open. It is clear though that it does need
to be present, as otherwise any kind of concurrency is impossible to achieve. The question ties
in with the still open question of [Quajects](http://valerieaurora.org/synthesis/SynthesisOS/ch4.html).
Meaning, what is the yin in the yin and yang of object oriented programming. The normal yang way sees
the code as active and the data as passive. By normal i mean oo implementations in which blocks and
closures just fall from the sky and have no internal structure. There is obviously a piece of
the puzzle missing that Alexia was onto.
### Start small
The first next step is to wrap the functionality i have in the Passes as a language.
Then to expand that language, by writing increasingly more complex programs in it.
And then to re-attack ruby using the whitequark parser, that probably means jumping on the
mspec train.
All in all, no biggie :-)
## Compilers are not free
Oh and i re-read and re-watched Toms [compilers for free](http://codon.com/compilers-for-free) talk,
which did make quite an impression on me the first time. But when i really thought about actually
going down that road (who does't enjoy a free beer), i got into the small print.
The second biggest of which is that writing a partial evaluator is just about as complicated
as writing a compiler.
But the biggest problem is that the (free) compiler you could get, has the implementation language
of the evaluator, as it's **output**. You need a compiler to start with, in other words.
Also the interpreter would have to be written in the same compilable language.
So writing a ruby compiler by writing a ruby interpreter would mean
writing the interpreter in c, and (worse) writing the partial evaluator *for* c, not for ruby.
Ok, maybe it is not quite as bad as that makes it sound. As i do have the register layer ready
and will be writing a c-ish language, it may even be possible to write an interpreter **in saml**,
and then it would be ok to write an evaluator **for saml** too.
I will nevertheless go the straighter route for now, ie write a compiler, and maybe return to the
promised freebie later. It does feel like a lot of what the partial evaluator is, would be called
compiler optimization in another lingo. So may be road will lead there naturally.