ruby-x.github.io/app/views/posts/_2015-9-3-a-new-language.haml
2018-04-10 22:12:47 +03:00

144 lines
7.6 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

%p
It is the
%strong one
thing i said i wasnt going to do: Write a language.
There are too many languages out there already, and just because i want to write a vm,
doesnt mean i want to add to the language jungle.
%strong But
%h2#the-gap The gap
%p
As it happens in life, which is why they say never to say never, it happens just like it
i didnt want. It turns out the semantic gap of what i have is too large.
%p
There is the
%strong register level
, which is approximately assembler, and there is the
%strong vm level
which is more or less the ruby level. So my head hurts from trying to implement ruby in assembler,
no wonder.
%p
Having run into this wall, which btw is the same wall that crystal ran into, one can see the sense
in what others have done more clearly: Why rubinus uses c++ underneath. Why crystal does not
implement ruby, but a statically typed language. And ultimately why there is no ruby compiler.
The gap is just too large to bridge.
%h2#the-need-for-a-language The need for a language
%p
As I have the architecture of passes, i was hoping to get by with just another layer in the
architecture. A tried an tested approach after all. And while i wont say that that isnt a
possibility, i just dont see it. I think it may be one of those where hindsight will be perfect.
%p
I can see as far as this: If i implement a language, that will mean a parser, ast and compiler.
The target will be my register layer. So a reasonable step up is a sort of object c, that has
basic integer maths and object access. Ill detail that more below, but the point is, if i have
that, i can start writing a vm implementation in that language.
%p
Off course the vm implementation involves a parser, an ast and a compiler, unless we go to the free
compilers (see below). And so implementing the vm in a new language is in essence swapping nodes of
the higher level tree with nodes of the lower level (c-ish) one. Ie parsing should not strictly
speaking be necessary. This node swapping is after all what the pass architecture was designed
to do. But, as i said, i just cant see that happening (yet?).
%h3#trees-vs-blocks Trees vs. Blocks
%p
Speaking of the Pass architecture: I flopped. Well, maybe not so much with the actual Passes, but
with the Method representation. Blocks holding Instructions, and being in essence a list.
Misinformed copying from llvm, misinformed by the final outcome. Off course the final binary
has a linear address space, but that is where the linearity ends. The natural structure of code
is a tree, not a list, as demonstrated by the parse
= succeed "." do
%em tree
%h2#soml---salama-object-machine-language Soml - Salama Object Machine Language
%h3#typed Typed
%p
Quite a while before crystallizing into the idea of a new language, i already saw the need for a type
system. Off course, and this dates back to the first memory layouts. But i mean the need for a
%em strong typing
system, or maybe its even clearer to call it compile time typing. The type that c
and c++ have. It is essential (mentally, this is off course all for the programmer, not the computer)
to be able to think in a static type system, and then extend that and make it dynamic.
Or possibly use it in a dynamic way.
%p
This is a good example of this too big gap, where one just steps on quicksand if everything is
all the time dynamic.
%p
The way i had the implementation figured was to have different versions of the same function. In
each function we would have compile time types, everything known. Ill probably still do that,
just written in Soml.
%h3#machine-language Machine language
%p
Soml is a machine language for the Salama machine. As i tried to implement without this layer, i was
essentially implementing in assembler. Too much.
%p
There are two main feature we need from the machine language, one is typed a typed oo memory model,
the other an oo call model.
%h3#object-c Object c
%p
The language needs to be object based, off course. Just because its typed and not dynamic
and closer to assembler, doesnt mean we need to give up objects. In fact we mustnt. Soml
should be a little bit like c++, ie compile time known variable arrangement and types,
objects. But no classes (or inheritance), more like structs, with full access to everything.
So a struct.variable syntax would mean grab that variable at that address, no functions, no possible
override, just get it. This is actually already implemented as i needed it for the slot access.
%p So objects without encapsulation or classes. A lower level object orientation.
%h3#whitequark Whitequark
%p
This new approach (and more experience) shed a new light on ruby parsing. The previous idea was to
start small, write the necessary stuff in the parsable subset and with time expand that set.
%p
Alas . . ruby is a beast to parse, and because of the
%strong semantic gap
writing the system,
even in a subset, is not viable. And it turns out the brave warriors of the ruby community have
already produced a pure, production ready,
= succeed "." do
%a{:href => "https://github.com/whitequark/parser"} ruby parser
%h3#interoperability Interoperability
%p
The system code needs to be callable from the higher level, and possibly the other way around.
This probably means the same or compatible calling mechanism and data model. The data model is
quite simple as the at the system level all is just machine words, but in object sized
packets. As for the calling it will probably mean that the same message object needs to be used
and what is now called calling at the machine level is supported. Sending off course wont be.
%h3#still-missing-a-piece Still missing a piece
%p
How the level below calling can be represented is still open. It is clear though that it does need
to be present, as otherwise any kind of concurrency is impossible to achieve. The question ties
in with the still open question of
= succeed "." do
%a{:href => "http://valerieaurora.org/synthesis/SynthesisOS/ch4.html"} Quajects
%h3#start-small Start small
%p The first next step is to wrap the functionality i have in the Passes as a language.
%p Then to expand that language, by writing increasingly more complex programs in it.
%p
And then to re-attack ruby using the whitequark parser, that probably means jumping on the
mspec train.
%p All in all, no biggie :-)
%h2#compilers-are-not-free Compilers are not free
%p
Oh and i re-read and re-watched Toms
%a{:href => "http://codon.com/compilers-for-free"} compilers for free
talk,
which did make quite an impression on me the first time. But when i really thought about actually
going down that road (who doest enjoy a free beer), i got into the small print.
%p
The second biggest of which is that writing a partial evaluator is just about as complicated
as writing a compiler.
%p
But the biggest problem is that the (free) compiler you could get, has the implementation language
of the evaluator, as its
= succeed "." do
%strong output
%em for
c, not for ruby.
%p
Ok, maybe it is not quite as bad as that makes it sound. As i do have the register layer ready
and will be writing a c-ish language, it may even be possible to write an interpreter
= succeed "," do
%strong in soml
%strong for soml
too.
%p
I will nevertheless go the straighter route for now, ie write a compiler, and maybe return to the
promised freebie later. It does feel like a lot of what the partial evaluator is, would be called
compiler optimization in another lingo. So may be road will lead there naturally.