ruby-x.github.io/_posts/2015-09-03-a-new-language.md
Torsten Ruger eb8099db31 minor
2015-11-21 17:59:36 +02:00

7.8 KiB

layout author
news Torsten

It is the one thing i said i wasn't going to do: Write a language. There are too many languages out there already, and just because i want to write a vm, doesn't mean i want to add to the language jungle. But ...

The gap

As it happens in life, which is why they say never to say never, it happens just like it i didn't want. It turns out the semantic gap of what i have is too large.

There is the register level , which is approximately assembler, and there is the vm level which is more or less the ruby level. So my head hurts from trying to implement ruby in assembler, no wonder.

Having run into this wall, which btw is the same wall that crystal ran into, one can see the sense in what others have done more clearly: Why rubinus uses c++ underneath. Why crystal does not implement ruby, but a statically typed language. And ultimately why there is no ruby compiler. The gap is just too large to bridge.

The need for a language

As I have the architecture of passes, i was hoping to get by with just another layer in the architecture. A tried an tested approach after all. And while i won't say that that isn't a possibility, i just don't see it. I think it may be one of those where hindsight will be perfect.

I can see as far as this: If i implement a language, that will mean a parser, ast and compiler. The target will be my register layer. So a reasonable step up is a sort of object c, that has basic integer maths and object access. I'll detail that more below, but the point is, if i have that, i can start writing a vm implementation in that language.

Off course the vm implementation involves a parser, an ast and a compiler, unless we go to the free compilers (see below). And so implementing the vm in a new language is in essence swapping nodes of the higher level tree with nodes of the lower level (c-ish) one. Ie parsing should not strictly speaking be necessary. This node swapping is after all what the pass architecture was designed to do. But, as i said, i just can't see that happening (yet?).

Trees vs. Blocks

Speaking of the Pass architecture: I flopped. Well, maybe not so much with the actual Passes, but with the Method representation. Blocks holding Instructions, and being in essence a list. Misinformed copying from llvm, misinformed by the final outcome. Off course the final binary has a linear address space, but that is where the linearity ends. The natural structure of code is a tree, not a list, as demonstrated by the parse tree. Flattening it just creates navigational problems. Also as a metal model it is easier, as it is easy to imagine swapping out subtrees, expanding or collapsing nodes etc.

Soml - Salama Object Machine Language

Typed

Quite a while before crystallizing into the idea of a new language, i already saw the need for a type system. Off course, and this dates back to the first memory layouts. But i mean the need for a strong typing system, or maybe it's even clearer to call it compile time typing. The type that c and c++ have. It is essential (mentally, this is off course all for the programmer, not the computer) to be able to think in a static type system, and then extend that and make it dynamic. Or possibly use it in a dynamic way.

This is a good example of this too big gap, where one just steps on quicksand if everything is all the time dynamic.

The way i had the implementation figured was to have different versions of the same function. In each function we would have compile time types, everything known. I'll probably still do that, just written in Soml.

Machine language

Soml is a machine language for the Salama machine. As i tried to implement without this layer, i was essentially implementing in assembler. Too much.

There are two main feature we need from the machine language, one is typed a typed oo memory model, the other an oo call model.

Object c

The language needs to be object based, off course. Just because it's typed and not dynamic and closer to assembler, doesn't mean we need to give up objects. In fact we mustn't. Soml should be a little bit like c++, ie compile time known variable arrangement and types, objects. But no classes (or inheritance), more like structs, with full access to everything. So a struct.variable syntax would mean grab that variable at that address, no functions, no possible override, just get it. This is actually already implemented as i needed it for the slot access.

So objects without encapsulation or classes. A lower level object orientation.

Whitequark

This new approach (and more experience) shed a new light on ruby parsing. The previous idea was to start small, write the necessary stuff in the parsable subset and with time expand that set.

Alas . . ruby is a beast to parse, and because of the semantic gap writing the system, even in a subset, is not viable. And it turns out the brave warriors of the ruby community have already produced a pure, production ready, ruby parser. That can obviously read itself and anything else, so the start small approach is doubly out.

Interoperability

The system code needs to be callable from the higher level, and possibly the other way around. This probably means the same or compatible calling mechanism and data model. The data model is quite simple as the at the system level all is just machine words, but in object sized packets. As for the calling it will probably mean that the same message object needs to be used and what is now called calling at the machine level is supported. Sending off course won't be.

Still missing a piece

How the level below calling can be represented is still open. It is clear though that it does need to be present, as otherwise any kind of concurrency is impossible to achieve. The question ties in with the still open question of Quajects. Meaning, what is the yin in the yin and yang of object oriented programming. The normal yang way sees the code as active and the data as passive. By normal i mean oo implementations in which blocks and closures just fall from the sky and have no internal structure. There is obviously a piece of the puzzle missing that Alexia was onto.

Start small

The first next step is to wrap the functionality i have in the Passes as a language.

Then to expand that language, by writing increasingly more complex programs in it.

And then to re-attack ruby using the whitequark parser, that probably means jumping on the mspec train.

All in all, no biggie :-)

Compilers are not free

Oh and i re-read and re-watched Toms compilers for free talk, which did make quite an impression on me the first time. But when i really thought about actually going down that road (who does't enjoy a free beer), i got into the small print.

The second biggest of which is that writing a partial evaluator is just about as complicated as writing a compiler.

But the biggest problem is that the (free) compiler you could get, has the implementation language of the evaluator, as it's output. You need a compiler to start with, in other words. Also the interpreter would have to be written in the same compilable language. So writing a ruby compiler by writing a ruby interpreter would mean writing the interpreter in c, and (worse) writing the partial evaluator for c, not for ruby.

Ok, maybe it is not quite as bad as that makes it sound. As i do have the register layer ready and will be writing a c-ish language, it may even be possible to write an interpreter in soml, and then it would be ok to write an evaluator for soml too.

I will nevertheless go the straighter route for now, ie write a compiler, and maybe return to the promised freebie later. It does feel like a lot of what the partial evaluator is, would be called compiler optimization in another lingo. So may be road will lead there naturally.