move posts in directory by year
This commit is contained in:
61
app/views/posts/2017/_01-02-a-new-year-a-new-name.haml
Normal file
61
app/views/posts/2017/_01-02-a-new-year-a-new-name.haml
Normal file
@ -0,0 +1,61 @@
|
||||
%h2#rubyx-compiles-ruby-to-binary RubyX compiles ruby to binary
|
||||
%p
|
||||
The previous name was from a time in ancient history, three years ago, in internet time over
|
||||
a decade (X years!). From when i thought i was going to build
|
||||
a virtual machine. It has been clear for a while that what i am really doing is building a
|
||||
compiler. A new thing needs a new name and finally inspiration struck in the form of RubyX.
|
||||
%p
|
||||
It’s a bit of a shame that both domain and github were taken, but the - versions work well too.
|
||||
Renaming of the organization, repositories and changing of domain is now complete. I did not
|
||||
rewrite history, so all old posts still refer to salama.
|
||||
%p
|
||||
What i like about the new name most, is the closeness to ruby, this is after all an implementation
|
||||
of ruby. Also the unclarity of what the X is is nice, is it as in X-files, the unknown of the
|
||||
maths variable or ala mac, the 10 for a version number? Or the hope of achieving 10 times
|
||||
performance as a play on the 3 times performance of ruby 3. It’s a mystery, but it is a ruby
|
||||
mystery and that is the main thing.
|
||||
%h3#type-system 2. Type system
|
||||
%p About the work that has been done, the type system rewrite is probably the biggest.
|
||||
%p
|
||||
Types are now immutable throughout the system, and the space keeps a list of all unique types.
|
||||
Adding, removing, changing type all goes through a hashing process and leads to a unique
|
||||
instance, that may have to be created.
|
||||
%h3#typedmethod-arguments-and-locals 3. TypedMethod arguments and locals
|
||||
%p
|
||||
Close on the heal of the type immutability was the change to types as argument and local variable
|
||||
descriptors. A type instance is now used to describe the arguments (names and types) uniquely,
|
||||
clearing up previous imprecision.
|
||||
%p
|
||||
Argument and locals type, along with the name of the method describe a method uniquely. Obviously
|
||||
the types may not be changed. Methods with different argument types are thus different methods, a
|
||||
fact that still has to be coded into the ruby compiler.
|
||||
%h3#arguments-and-calling-convention 4. Arguments and calling convention
|
||||
%p
|
||||
The Message used to carry the arguments, while locals were a separate frame object. An imbalance
|
||||
if one thinks about closures, as both have to be decoupled from their activation.
|
||||
%p
|
||||
Now both arguments and locals are represented as NamedList’s, which are basically just objects.
|
||||
The type is transferred from the method to the NamedList instance at call time, so it is available
|
||||
at run-time. This makes the whole calling convention easier to understand.
|
||||
%h3#parfait-in-ruby 5. Parfait in ruby
|
||||
%p
|
||||
Parfait is more normal ruby now, specifically we are using instance variables in Parfait again,
|
||||
just like in any ruby. When compiling we have to deal with the mapping to indexes, but that’s what
|
||||
we have types for, so no problem. The new version simplifies the boot process a little too.
|
||||
%p Positioning has been removed from Parfait completely and pushed into the Assembler where it belongs.
|
||||
%h3#soml-goodbye 6. SOML goodbye
|
||||
%p
|
||||
All trances of the soml language have been eradicated. All that is left is an intermediate typed
|
||||
tree representation. But the MethodCompiler still generates binary so that’s good.
|
||||
Class and method generation capabilities have been removed from that compiler and now live
|
||||
one floor up, at the ruby level.
|
||||
%h3#ruby-compiler 7. Ruby Compiler
|
||||
%p
|
||||
Finally work on the ruby compiler has started and after all that ground work is actually quite easy.
|
||||
Class statements create classes already. Method definitions extract their argument and local
|
||||
variable names, and create their representation as RubyMethod. More to come.
|
||||
%p
|
||||
All in all almost all of the previous posts todos are done. Next up is the fanning of RubyMethods
|
||||
into TypedMethods by instantiating type variations. When compilation of those works, i just need
|
||||
to implement the cross function jumps and voila.
|
||||
%p Certainly an interesting year ahead.
|
122
app/views/posts/2017/_01-10-integer-unification.haml
Normal file
122
app/views/posts/2017/_01-10-integer-unification.haml
Normal file
@ -0,0 +1,122 @@
|
||||
%p
|
||||
I just read mri 2.4 “unifies” Fixnum and Integer. This, it turns out, is something quite
|
||||
different from what i though, mostly about which class names are returned.
|
||||
And that it is ok to have two implementations for the same class, Integer.
|
||||
%p
|
||||
But even it wasn’t what i thought, it did spark an idea, and i hope a solution to a problem
|
||||
that i have seen lurking ahead. Strangely the solution maybe even more radical than the
|
||||
cross function jumps it replaces.
|
||||
%h2#a-problem-lurking-ahead A problem lurking ahead
|
||||
%p As i have been thinking more about what happens when a type changes, i noticed something:
|
||||
%p
|
||||
An object may change it’s type in one method (A), but may be used in a method (B), far up the call
|
||||
stack. How does B know to treat the object different. Specifically, the calls B makes
|
||||
on the object are determined by the type before the change. So they will be wrong after the change,
|
||||
and so B needs to know about the type change.
|
||||
%p
|
||||
Such a type change was supposed to be handled by a cross method jump, thus fixing the problem
|
||||
in A. But the propagation to B is cumbersome, there can be just so many of them.
|
||||
Anything that i though of is quite a bit too involved. And this is before even thinking about closures.
|
||||
%h2#a-step-back A step back
|
||||
%p
|
||||
Looking at this from a little higher vantage there are maybe one too many things i have been trying
|
||||
to avoid.
|
||||
%p
|
||||
The first one was the bit-tagging. The ruby (and smalltalk) way of tagging an integer
|
||||
with a marker bit. Thus loosing a bit and gaining a gazillion type checks. In mri c land
|
||||
an object is a VALUE, and a VALUE is either a tagged integer or a pointer to an object struct.
|
||||
So on
|
||||
%strong every
|
||||
operation the bit has to be checked. Both of these i’ve been trying to avoid.
|
||||
%p
|
||||
So that lead to a system with no explicit information in the lowest level representation and
|
||||
thus a large dance to have that information in an external type system and keeping that type
|
||||
information up to date.
|
||||
%p
|
||||
Off course the elephant in the room here is that i have also be trying to avoid making integers and
|
||||
floats objects. Ie keeping their c, or machine representation, just like anyone else before me.
|
||||
Too wasteful to even think otherwise.
|
||||
%h2#and-a-step-forward And a step forward
|
||||
%p
|
||||
The inspiration that came by reading about the unification of integers was exactly that:
|
||||
%strong to unify integers
|
||||
\. Unifying with objects, ie
|
||||
%strong making integers objects
|
||||
%p
|
||||
I have been struggling with the dichotomy between integer and objects for a long time. There always
|
||||
seemed something so fundamentally wrong there. Ok, maybe if the actual hardware would do the tagging
|
||||
and that continuous checking, then maybe. But otherwise: one is a direct, the other an indirect
|
||||
value. It just seemed wrong.
|
||||
%p
|
||||
Making Integers (and floats etc) first class citizens, objects with a type, resolves the chasm
|
||||
very nicely. Off course it does so at a price, but i think it will be worth it.
|
||||
%h2#the-price-of-unification The price of Unification
|
||||
%p
|
||||
Initially i wanted to make all objects the size of a cache line or multiples thereof. This is
|
||||
something i’ll have to let go of: Integer objects should naturally be 2 words, namely the type
|
||||
and the actual value.
|
||||
%p
|
||||
So this is doubling the amount of ram used to represent integers. But maybe worse, it makes them
|
||||
subject to garbage collection. Both can probably be alleviated by having the first 256 pinned, ie
|
||||
a fixed array, but still.
|
||||
%p
|
||||
Also using a dedicated memory manager for them and keeping a pool of unused as a linked list
|
||||
should make it quick. And off course the main hope lies in the fact that your average program
|
||||
nowadays (especially oo) does not really use integers all that much.
|
||||
%h2#oo-to-the-rescue OO to the rescue
|
||||
%p
|
||||
Off course this is not the first time my thought have strayed that way. There are two reasons why
|
||||
they quickly scuttled back home to known territory before. The first was the automatic optimization
|
||||
reflex: why use 2 words for something that can be done in one, and all that gc on top.
|
||||
%p
|
||||
But the second was probably even more important: If we then have the value inside the object
|
||||
(as a sort of instance variable or array element), then when return it then we have the “naked”
|
||||
integer wreaking havoc in our system, as the code expects objects everywhere.
|
||||
And if we don’t return it, then how do operations happen, since machines only operate on values.
|
||||
%p
|
||||
The thing that i had not considered is that that line of thinking is mixing up the levels
|
||||
of abstraction. It assumes a lower level than one needs: What is needed is that the system
|
||||
knows about integer objects (in a similar way that the other ways assumes knowledge of integer
|
||||
values.)
|
||||
%p
|
||||
Concretely the “machine”, or compiler, needs to be able to perform the basic Integer operations,
|
||||
on the Integer objects. This is really not so different from it knowing how to perform the
|
||||
operations on two values. It just involves getting the actual values from the object and
|
||||
putting them back.
|
||||
%p
|
||||
OO helps in another way that never occurred to me.
|
||||
%strong Data hiding:
|
||||
we never actually pass out
|
||||
the value. The value is private to the object and not accessible from the outside. In fact it not
|
||||
even accessible from the inside to the object itself. Admittedly this means more functionality in
|
||||
the compiler, but since that is a solved problem (see builtin), it’s ok.
|
||||
%h2#unified-method-caching Unified method caching
|
||||
%p
|
||||
So having gained this unification, we can now determine the type of an object very very easily.
|
||||
The type will
|
||||
%em always
|
||||
be the first word of the memory that the object occupies. We don’t have
|
||||
immediate values anymore, so always is always.
|
||||
%p
|
||||
This is
|
||||
%em very
|
||||
handy, since we have given up being god and thus knowing everything at any time.
|
||||
In concrete terms this means that in a method, we can
|
||||
%em not
|
||||
know what type an object is.
|
||||
In fact it’s worse, we can’t even say what type it is, even if we have checked it, but after we
|
||||
have passed it as an argument to another method.
|
||||
%p
|
||||
Luckily programs are not random, and it quite rare for an object to change type, and so a given
|
||||
object will usually have one of a very small set of types. This can be used to do method caching.
|
||||
Instead of looking up the method statically and calling it unconditionally at run-time, we will
|
||||
need some kind of lookup at run-time.
|
||||
%p
|
||||
The lookup tables can be objects that the method carries. A small table (3 entries) with pairs of
|
||||
type vs jump address. A little assembler to go through the list and jump, or in case of a miss
|
||||
jump to some handler that does a real lookup in the type.
|
||||
%p
|
||||
In a distant future a smaller version may be created. For the case where the type has been
|
||||
checked already during the method, a further check may be inlined completely into the code and
|
||||
only revert to the table in case of a miss. But that’s down the road a bit.
|
||||
%p Next question: How does this work with Parfait. Or the interpreter??
|
@ -0,0 +1,88 @@
|
||||
%p
|
||||
As i said in the last post, a step back and forward, possibly two, was taken and understanding
|
||||
grows again. Especially when i think that some way is the way, it always changes and i turn out
|
||||
to be at least partially wrong. The way of life, of imperfect intelligence, to strive for that
|
||||
perfection that is forever out of reach. Here’s the next installment.
|
||||
%h2#slopes-and-ramps Slopes and Ramps
|
||||
%p
|
||||
When thinking about method caching and how to implement it i came across this thing that i will
|
||||
call a Slope for now. The Slope of a function that is. At least that’s where the thought started.
|
||||
%p The Slope of a function is a piece of code that has two main properties:
|
||||
%ul
|
||||
%li
|
||||
it is straight, up to the end. i mean it has no branches from the outside.
|
||||
It may have internally but that does not affect anything.
|
||||
%li it ends in a branch that returns (a call), but this is not part of the Slope
|
||||
%p
|
||||
Those
|
||||
%em two
|
||||
properties would better be called a Ramp. The Ramp the function goes along before it
|
||||
jumps of to the next function.
|
||||
%p
|
||||
The
|
||||
%strong Slope
|
||||
is the part before the jump. So a Ramp is a Slope and a Jump.
|
||||
%p
|
||||
Code in the Slope, it struck me, has the unique possibility of doing a jump, with out worrying about
|
||||
returning. After all, it knows there is a call coming. After contemplating this a little i
|
||||
found the flaw, which one understands when thinking about where the function returns to. So Slope
|
||||
can jump away without caring if (and only if) the return address is set to after that jump (and the
|
||||
address is actually set by the code before the jump).
|
||||
%p
|
||||
Remembering that we set the return address in the caller (not as in c the callee) we can arrange
|
||||
for that. And so we can write Slope code that just keeps going. Because once the return address
|
||||
is set up, the code can just keep jumping forward. The only thing is that the call must come.
|
||||
%p
|
||||
In more concrete terms: Method caching can be a series of checks and jumps. If the check is ok
|
||||
we call, otherwise jump on. And even the last fail (the switches default case) can be a jump
|
||||
to what we would otherwise call a method. A method that determines the real jump target from
|
||||
the type (of self, in the message) and calls it. Except it’s not a method because it never
|
||||
returns, which is symmetrically to us not calling it.
|
||||
%p
|
||||
So this kind of “method” which is not really a method, but still a fair bit of logic, i’ll call
|
||||
a Slope.
|
||||
%h2#links-and-chains Links and Chains
|
||||
%p
|
||||
A Slope, the story continues, is really just a specific case of something else. If we take away
|
||||
the expectation that a call is coming, we are left with a sequence of code with jumps to more
|
||||
code. This could be called a Chain, and each part of the Chain would be a Link.
|
||||
%p
|
||||
To define that: a
|
||||
%strong Link
|
||||
is sequence of code that ends in a jump. It has no other jumps, just
|
||||
the one at the end. And the jump at the end jumps to another Link.
|
||||
%p The Code i am talking about here is risc level code, one could say assembler instructions.
|
||||
%p
|
||||
The concept though is very familiar: at a higher level the Link would be a Statement and a
|
||||
Chain a sequence of Statements. We’re missing the branch abstraction yet, but otherwise this is
|
||||
a lower level description of code in a similar way as the typed level Code and Statements are
|
||||
a description of higher level code.
|
||||
%h2#typed-level-is-wrong Typed level is wrong
|
||||
%p
|
||||
The level that is nowadays called Typed, and used to be soml, is basically made up of language
|
||||
constructs. It does not allow for manipulation of the risc level. As the ruby level is translated
|
||||
to the typed level, which in turn is translated to the risc level, the ruby compiler has no
|
||||
way of manipulating the risc level. This is as it should be.
|
||||
%p
|
||||
The problem is just, that the constructs that are currently at the typed level, do not allow
|
||||
to express the results needed at the risc level.
|
||||
%p
|
||||
Through the history of the development the levels have become mixed up. It is relatively clear at
|
||||
the ruby level what kind of construct is needed at the risc level. This is what has to drive the
|
||||
constructs at the typed level. We need access to these kinds of Slope or Link ideas at the ruby
|
||||
level.
|
||||
%p
|
||||
Another way of looking at the typed level inadequacies is the size of the codes generated. Some of
|
||||
the expressions (or statements) resolve to 2 or 3 risc instructions. Others, like the call, are
|
||||
15. This is an indication that part of the level is wrong. A good way to architect the layers
|
||||
would result in an
|
||||
%em even
|
||||
expansion of the amount of code at every level.
|
||||
%h2#too-little-testing Too little testing
|
||||
%p
|
||||
The ruby compiler should really drive the development more. The syntax and behavior of ruby are
|
||||
quite clear, and i feel the risc layer is quite a solid target. So before removing too much or
|
||||
rewriting too much i shall just add more (and more) functionality to the typed layer.
|
||||
%p
|
||||
At the same time some of the concepts (like a method call) will probably not find any use, but
|
||||
as long as they don’t harm, i shall leave them lying around.
|
90
app/views/posts/2017/_03-03-layer-summary.haml
Normal file
90
app/views/posts/2017/_03-03-layer-summary.haml
Normal file
@ -0,0 +1,90 @@
|
||||
%p Going on holiday without a computer was great. Forcing me to recap and write things down on paper.
|
||||
%h2#layers Layers
|
||||
%p
|
||||
One of the main results was that the current layers are a bit mixed up and that will have to be
|
||||
fixed. But first, some of the properties in which i think of the different layers.
|
||||
%h3#layer-properties Layer properties
|
||||
%p
|
||||
%strong Structure of the representation
|
||||
is one of the main distinction of the layers. We know the parser gives us a
|
||||
%strong tree
|
||||
and that the produced binary is a
|
||||
= succeed "," do
|
||||
%strong blob
|
||||
%p
|
||||
A closely related property of the representation is whether it is
|
||||
= succeed "." do
|
||||
%strong abstract or concrete
|
||||
%p
|
||||
If we think of the layer as a language, what
|
||||
%strong Language level
|
||||
would it be, assembler, c, oo.
|
||||
Does it have
|
||||
= succeed "," do
|
||||
%strong control structures
|
||||
= succeed "." do
|
||||
%strong jumps
|
||||
%h3#ruby-layer Ruby Layer
|
||||
%p
|
||||
The top ruby layer is a given, since it is provided by the external gem
|
||||
= succeed "." do
|
||||
%em parser
|
||||
= succeed "." do
|
||||
%em tree
|
||||
%p
|
||||
What might sound self-evident that this layer is very close to ruby, this means that inherits
|
||||
all of ruby’s quirks, and all the redundancy that makes ruby a nice language. By quirks i mean
|
||||
things like the integer 0 being true in an if statement. A good example of redundancy is the
|
||||
existence of if and until, or the ability to add if after the statement.
|
||||
%h3#virtual-language Virtual Language
|
||||
%p
|
||||
The next layer down, and the first to be defined in ruby-x, is the virtual language layer.
|
||||
By language i mean object oriented language, and by virtual an non existent minimal version of an
|
||||
object oriented language. This is like ruby, but without the quirks or redundancy. This is
|
||||
meant to be compatible with other oo languages, meaning that it should be possible to transform
|
||||
a python or smalltalk program into this layer.
|
||||
%p
|
||||
The layer is represented as a concrete tree and derived from the ast by removing:
|
||||
\- unless, the ternary operator and post conditionals
|
||||
\- splats and multi-assignment
|
||||
\- implicit block passing
|
||||
\- case statement
|
||||
\- global variables
|
||||
%p It should be relatively obvious how these can be replaced by existing constructs (details in code)
|
||||
%h3#virtual-object-machine Virtual object Machine
|
||||
%p
|
||||
The next down represents what we think of as a machine, more than a language, and an object
|
||||
oriented at that.
|
||||
%p
|
||||
A differentiating factor is that a machine has no control structures like a language. Only jumps.
|
||||
The logical structure is more a stream or array. Something closer to the memory that
|
||||
i will map to in lower layers. We still use a tree representation for this level, but with the
|
||||
interpretation that neighboring children get implicitly jumped to.
|
||||
%p
|
||||
The machine deals in objects, not in memory as a von Neumann machine would. The machine has
|
||||
instructions to move data from one object to another. There are no registers, just objects.
|
||||
Also basic arithmetic and testing is covered by the instruction set.
|
||||
%h3#risc-layer Risc layer
|
||||
%p
|
||||
This layer is a minimal abstraction of an arm processor. Ie there are eight registers, instructions
|
||||
to and from memory and between registers. Basic integer operations work on registers. So does
|
||||
testing, and off course there are jumps. While the layer deals in random access memory, it is
|
||||
aware and uses the object machines objects.
|
||||
%p
|
||||
The layer is minimal in the sense that it defines only instructions needed to implement ruby.
|
||||
Instructions are defined in a concrete manner, ie one class per Instruction, which make the
|
||||
set of Instructions extensible by other gems.
|
||||
%p
|
||||
The structure is a linked list which is manly interested in three types of Instructions. Namely
|
||||
Jumps, jump targets (Labels), and all other. All the other Instructions a linear in the von Neumann
|
||||
sense, that the next instruction will be executed implicitly.
|
||||
%h3#arm-and-elf-layer Arm and elf Layer
|
||||
%p
|
||||
The mapping of the risc layer to the arm layer is very straightforward, basically one to one with
|
||||
the exception of constant loading (which is quirky on the arm 32 bit due to historical reasons).
|
||||
Arm instructions (being instructions of a real cpu), have the ability to assemble themselves into
|
||||
binary, which apart from the loading are 4 bytes.
|
||||
%p The structure of the Arm instruction is the same as the risc layer, a linked list.
|
||||
%p
|
||||
There is also code to assemble the objects, and with the instruction stream make a binary elf
|
||||
executable. While elf support is minimal, the executable does execute on rasperry pi or qemu.
|
79
app/views/posts/2017/_04-07-how-not-to-interpret.haml
Normal file
79
app/views/posts/2017/_04-07-how-not-to-interpret.haml
Normal file
@ -0,0 +1,79 @@
|
||||
%p Method caching can be done at language level. Wow. But first some boring news:
|
||||
%h2#vool-is-ready-mom-is-coming Vool is ready, Mom is coming
|
||||
%p
|
||||
The
|
||||
= succeed "irtual" do
|
||||
%strong V
|
||||
= succeed "bject" do
|
||||
%strong O
|
||||
= succeed "riented" do
|
||||
%strong O
|
||||
= succeed "anguage" do
|
||||
%strong L
|
||||
%p
|
||||
Vool will not reflect some of ruby’s more advanced features, like splats or implicit blocks,
|
||||
and hopes to make the conditional logic more consistent.
|
||||
%p
|
||||
The
|
||||
= succeed "inimal" do
|
||||
%strong M
|
||||
= succeed "bject" do
|
||||
%strong O
|
||||
= succeed "achine" do
|
||||
%strong M
|
||||
%h2#inline-method-caching Inline Method caching
|
||||
%p
|
||||
In ruby almost all work is actually done by method calling and an interpreter spends much of it’s
|
||||
time looking up methods to call. The obvious thing to do is to cache the result, and this has
|
||||
been the plan for a while.
|
||||
%p
|
||||
Off course for caching to work, one needs a cache key and invalidation strategy, both of which
|
||||
are handled by the static types, which i’ll review below.
|
||||
%h3#small-cache Small cache
|
||||
%p
|
||||
Aaron Patterson has done
|
||||
%a{:href => "https://www.youtube.com/watch?v=b77V0rkr5rk"} research into method caching
|
||||
in mri and found that most call sites (>99%) only need one cache entry.
|
||||
%p
|
||||
This means a single small object can carry the information needed, probably type, function address
|
||||
and counter, times two.
|
||||
%p
|
||||
In rubyx this can literally be an object that we attach to the CallSite, either prefill if possible
|
||||
or leave to be used at runtime.
|
||||
%h3#method-lookup-is-a-static-function Method lookup is a static function
|
||||
%p
|
||||
The other important idea here is that the actual lookup of a method is a know function. Known at
|
||||
compile time that is.
|
||||
%p
|
||||
Thus dynamic dispatch can be substituted by a cache lookup, and a static call. The result of the call
|
||||
can/should update the cache and then we can start with the lookup again.
|
||||
%p
|
||||
This makes it possible to remove dynamic dispatch from the code, actually at code level.
|
||||
I had previously though of implementing the send at a lower level, but see now that it would
|
||||
be quite possible to do it at the language level with an if and a call, possible another call
|
||||
for the miss. That would drop the language down from dynamic (4th level) to static (3rd level).
|
||||
%p I am still somewhat at odds whether to actually do this or leave it for the machine level (mom).
|
||||
%h2#static-type-review Static Type review
|
||||
%p
|
||||
To make the caching possible, the cache key - value association has to be constant.
|
||||
Off course in oo systems the class of an object is constant and so we could just use that.
|
||||
But in ruby you can change the class, add instance variables or add/remove/change methods,
|
||||
and so the class as a key and the method as value is not correct over time.
|
||||
%p
|
||||
In rubyx, an object has a type, and it’s type can change. But a type can never change. A type refers
|
||||
to the class that it represented at the time of creation. Conversely a class carries an instance
|
||||
type, which is the type of new instances that get created. But when variables or methods are added
|
||||
or removed from the class, a new type is created. Type instances never change. Method implementations
|
||||
are attached to types, and once compiled, never changed either.
|
||||
%p
|
||||
Thus using the object’s type as cache key and the method as it’s value will stay correct over time.
|
||||
And the double bonus of this is that it takes care of both objects of different classes (as those will have different type for sure), but also objects of the same class, at different times, when
|
||||
eg a method with the same name has been added. Those objects will have different type too, and
|
||||
thus experience a cache miss and have their correct method found.
|
||||
%h2#up-next Up next
|
||||
%p
|
||||
More grunt-work. Now that Vool replaces the ast the code from rubyx/passes has to be “ported” to use it. That means:
|
||||
\- class extraction and class object creation
|
||||
\- method extraction and creation
|
||||
\- type creation by ivar analysis
|
||||
\- frame creation by local variable analysis
|
@ -0,0 +1,92 @@
|
||||
%p
|
||||
While work on Mom (Minimal object machine) continues, i can see the futures a little clearer.
|
||||
Alas, for now the shortest route is best, so the future will have to wait. But here is what i’m
|
||||
thinking.
|
||||
%h2#types-today Types today
|
||||
%p
|
||||
The
|
||||
%a{:href => "/rubyx/layers.html"} architecture
|
||||
document outlines this in more detail, but in short:
|
||||
%ul
|
||||
%li types are immutable
|
||||
%li every object has a type (which may change)
|
||||
%li a type implements the interface of a class at a given time
|
||||
%li a type is defined by a list of attribute names
|
||||
%p
|
||||
%img{:alt => "Types diagram", :src => "/assets/types.jpg"}/
|
||||
%h3#how-classes-work How classes work
|
||||
%p
|
||||
So the interesting thing here is how the classes work. Seeing as they are open, attributes can
|
||||
be added and removed, but the types are immutable.
|
||||
%p The solution is easy: when a new attribute is added to a class, a new type is created.
|
||||
%p
|
||||
The
|
||||
%em instance type
|
||||
is then updated to point to the current type. This means that new objects will
|
||||
be created with the new type, and old ones will keep their old type. Until the attribute is
|
||||
added to them too, in which case their
|
||||
%em type
|
||||
is updated too.
|
||||
%p
|
||||
%strong Methods
|
||||
btw are stored at the Type, as they encode the knowledge of the memory layout
|
||||
that comes with the type, into the code of the method. Remember: full data hiding, only objects
|
||||
methods can access the variables, hence the type needs to be know only for
|
||||
= succeed "." do
|
||||
%em self
|
||||
%h2#the-future-of-types The future of types
|
||||
%p
|
||||
But what i wanted to talk about is how this picture is going to change in the future.
|
||||
To understand why we might want to, let’s look at method dispatch on an instance variable.
|
||||
%p
|
||||
When you write something like @me.length , the compiler can check that @me is indeed an instance variable by checking the type of self. But since not information is stored about the type of
|
||||
%em me
|
||||
, a dynamic dispatch is needed to call
|
||||
= succeed "." do
|
||||
%em length
|
||||
%p
|
||||
The simple idea is to get rid of this dynamic dispatch by storing the type of instance variables
|
||||
too. This makes a lot calls faster, but it does come at significant cost:
|
||||
\- every assignment to the variable has to be checked for type.
|
||||
\- many more types must be created to differentiate the variables by name
|
||||
%strong and
|
||||
type.
|
||||
%p
|
||||
Both of those don’t maybe sound soo bad at first, but it’s the cumulative effects that make a
|
||||
difference. Instance assignment is one of the only two ways to move data around in a oo machine.
|
||||
That’s a lot of checking. And Types hold the methods, so for every new type
|
||||
%em all
|
||||
methods have
|
||||
to be
|
||||
%em a
|
||||
stored, and
|
||||
%em b
|
||||
created/compiled .
|
||||
%p But off course the biggest thing is all the coding this entails. So that’s why it’s in the future :-)
|
||||
%h2#multilayered-mom Multilayered Mom
|
||||
%p
|
||||
Just a note on Mom: this was meant to be a bridge between the language layer (vool) and the machine
|
||||
layer (risc). This step, from tree and statements, to list and low level instructions was deemed
|
||||
to big, so the abstract Minimal Object Machine is supposed to be a layer in between those.
|
||||
And it is off course.
|
||||
%p
|
||||
What i didn’t fully appreciate before starting was that the two things are related. I mean
|
||||
statements lend themselves to a tree, while having instruction in a tree is kind of silly.
|
||||
Similarly statements in a list doesn’t really make sense either. So it ended up being a two step
|
||||
process inside Mom.
|
||||
%p
|
||||
The
|
||||
%em first
|
||||
pass that transforms vool, keeps the tree structure. But it does introduce Mom’s own
|
||||
instructions. It turns out that this is sensible for exactly the linear parts of code.
|
||||
%p
|
||||
The
|
||||
%em second
|
||||
pass flattens the remaining control structures into jumps and labels. The result
|
||||
maps to the risc layer 1 to n, meaning every Mom instruction simple expands into one or usually
|
||||
more risc instructions.
|
||||
%p
|
||||
In the future i envision that this intermediate representation at the Mom level will be a
|
||||
good place for further optimisations, but we shall see. At least the code is still recognisable,
|
||||
meaning relatively easy to reason about. This is a property that the risc layer really does
|
||||
not have anymore.
|
99
app/views/posts/2017/_11-11-its-about-self-control.haml
Normal file
99
app/views/posts/2017/_11-11-its-about-self-control.haml
Normal file
@ -0,0 +1,99 @@
|
||||
%p Since i currently have no time to do actual work, i’ve been doing some research.
|
||||
%p
|
||||
Reading about other implementations, especially transpiling ones. Opal, ruby to
|
||||
javascript, and jruby, ruby to java, or jvm instructions.
|
||||
%h2#reconsidering-the-madness Reconsidering the madness
|
||||
%p
|
||||
One needs to keep an open mind off course. “Reinventing” the wheel is not good, they
|
||||
say. Off course we don’t invent any wheels in IT, we just like the way that sounds,
|
||||
but even building a wheel, when you can buy one, is bad enough.
|
||||
And off course i have looked at using other peoples code from the beginning.
|
||||
%p
|
||||
A special eye went towards the go language this time. Go has a built in assembler, i
|
||||
didn’t know that. Sure compilers use assembler stages, but the thing about go’s
|
||||
spin on it is that it is quite close to what i call the risc layer. Ie it is machine
|
||||
independent and abstracts many of
|
||||
%em real
|
||||
assemblers quirks away. And also go does
|
||||
not expose the full assembler spectrum , so there are ways to write assembler within
|
||||
go. All very promising.
|
||||
%p
|
||||
Go has closures, also very nice, and what they call escape analysis. Meaning that while
|
||||
normally go will use the stack for locals, it has checks for closures and moves
|
||||
variables to the heap if need be.
|
||||
%p
|
||||
So many goodies. And then there is the runtime and all that code that exists already,
|
||||
so the std lib would be a straight pass through, much like mri. On top one of the best
|
||||
gc’s i’ve heard about, tooling, lot’s of code, interoperability and a community.
|
||||
%p
|
||||
The price is off course that one (me) would have to become an expert in go. Not too
|
||||
bad, but still. As a preference i naturally tend to ruby, but maybe one can devise
|
||||
a way to automate the bridge somewhat. Already found a gem to make extensions in go.
|
||||
%p
|
||||
And, while looking, there seems to be one or two ruby in go projects already out there.
|
||||
Unfortunately interpreters :-(
|
||||
%h2#sort-of-dealbreaker Sort of dealbreaker
|
||||
%p
|
||||
Looking deeper into transpiling and using the go runtime i read about the type system.
|
||||
It’s a good type system i think, and go even provides reflection. So it would be
|
||||
nice to use it. This would provide good interoperability with go and use the existing
|
||||
facilities.
|
||||
%p
|
||||
Just to scrape the alternative: One could use arrays as the basic structure to build
|
||||
objects. Much in the same way MRI does. This would mean
|
||||
%em not
|
||||
using the type system,
|
||||
but instead building one. Thinking of the wheels … no, no go.
|
||||
%p
|
||||
So a go type for each of what we currently have as Type. Since the current system
|
||||
is built around immutable types, this seems a good match. The only glitch is that,
|
||||
eg when adding an instance or method to an existing object, the type of that object
|
||||
would have to change. A glitch, nothing more, just breaking the one constant static
|
||||
languages are built on. But digging deep into the go code, i am relatively
|
||||
certain one could deal with that.
|
||||
%p
|
||||
Digging deeper i read more about the go interfaces. I really can’t see a way to have
|
||||
%em only
|
||||
specific (typed) methods or instances. I mean the current type model is about
|
||||
types names and the number of slots, not typing every slot, as go. Or for methods,
|
||||
the idea is to have a name and a certain amount of arguments, and specific implementations for each type of self. Not a separate implementation for each possible combination of types. This means using go’s interfaces for variables and methods.
|
||||
%p
|
||||
And here it comes: When using the reflect package to ensure the type safety at runtime,
|
||||
go is really slow.
|
||||
10+
|
||||
%a{:href => "http://blog.burntsushi.net/type-parametric-functions-golang/"} times slower
|
||||
maybe. I’m guessing it is not really their priority.
|
||||
%p
|
||||
Also, from an architecture kind of viewpoint, having all those interfaces doesn’t seem
|
||||
good. Many small objects, basically one interface object for every object
|
||||
in the system, just adds lots of load. Unnecessary, ugly.
|
||||
%h2#the-conclusion The conclusion
|
||||
%p I just read about a go proposal to have int overflow panic. Too good.
|
||||
%p
|
||||
But in the end, i’ve decided to let go go. In some ways it would seem transpiling
|
||||
to C would be much easier. Use the array, bake our types, bend those pointers.
|
||||
While go is definitely the much better language for working in, for transpiling into
|
||||
it seems to put up more hurdles than provide help.
|
||||
%p
|
||||
Having considered this, i can understand rubinius’s choice of c++ much better.
|
||||
The object model fits well. Given just a single slot for dynamic expansion one
|
||||
could make that work. One would just have to use the c++ classes as types, not as ruby
|
||||
classes. Classes are not types, not when you can modify them!
|
||||
%p But at the end it is not even about which code you’re writing, how good the fit.
|
||||
%p
|
||||
It is about design, about change. To make this work (this meaning compiling a dynamic language to binary), flexibility is the key. It’s not done, much is unclear, and one
|
||||
must be able to change and change quickly.
|
||||
%p
|
||||
Self change, just like in life, is the only real form of control. To maximise that
|
||||
i didn’t use metasm or llvm, and it is also the reason go will not feature in this
|
||||
project. At the risk of never actually getting there, or having no users. Something
|
||||
Sinatra sang comes to mind, about doing it a specific way :-)
|
||||
%p
|
||||
There is still a lot to be learnt from go though, as much from the language as the
|
||||
project. I find it inspiring that they moved from a c to a go compiler in a minor
|
||||
version. And that what must be a major language in google has less commits than
|
||||
rails. It does give hope.
|
||||
%p
|
||||
PPS: Also revisited llvm (too complicated) and crystal (too complicated, bad fit in
|
||||
type system) after this. Could still do rust off course, but the more i write, the
|
||||
more i hear the call of simplicity (something that a normal person can still understand)
|
Reference in New Issue
Block a user