unification strikes
This commit is contained in:
parent
1fc8169a59
commit
06220e3735
122
_posts/2017-01-10-integer-unification.md
Normal file
122
_posts/2017-01-10-integer-unification.md
Normal file
@ -0,0 +1,122 @@
|
|||||||
|
---
|
||||||
|
layout: news
|
||||||
|
author: Torsten
|
||||||
|
---
|
||||||
|
|
||||||
|
I just read mri 2.4 "unifies" Fixnum and Integer. This, it turns out, is something quite
|
||||||
|
different from what i though, mostly about which class names are returned.
|
||||||
|
And that it is ok to have two implementations for the same class, Integer.
|
||||||
|
|
||||||
|
But even it wasn't what i thought, it did spark an idea, and i hope a solution to a problem
|
||||||
|
that i have seen lurking ahead. Strangely the solution maybe even more radical than the
|
||||||
|
cross function jumps it replaces.
|
||||||
|
|
||||||
|
## A problem lurking ahead
|
||||||
|
|
||||||
|
As i have been thinking more about what happens when a type changes, i noticed something:
|
||||||
|
|
||||||
|
An object may change it's type in one method (A), but may be used in a method (B), far up the call
|
||||||
|
stack. How does B know to treat the object different. Specifically, the calls B makes
|
||||||
|
on the object are determined by the type before the change. So they will be wrong after the change,
|
||||||
|
and so B needs to know about the type change.
|
||||||
|
|
||||||
|
Such a type change was supposed to be handled by a cross method jump, thus fixing the problem
|
||||||
|
in A. But the propagation to B is cumbersome, there can be just so many of them.
|
||||||
|
Anything that i though of is quite a bit too involved. And this is before even thinking about closures.
|
||||||
|
|
||||||
|
## A step back
|
||||||
|
|
||||||
|
Looking at this from a little higher vantage there are maybe one too many things i have been trying
|
||||||
|
to avoid.
|
||||||
|
|
||||||
|
The first one was the bit-tagging. The ruby (and smalltalk) way of tagging an integer
|
||||||
|
with a marker bit. Thus loosing a bit and gaining a gazillion type checks. In mri c land
|
||||||
|
an object is a VALUE, and a VALUE is either a tagged integer or a pointer to an object struct.
|
||||||
|
So on **every** operation the bit has to be checked. Both of these i've been trying to avoid.
|
||||||
|
|
||||||
|
So that lead to a system with no explicit information in the lowest level representation and
|
||||||
|
thus a large dance to have that information in an external type system and keeping that type
|
||||||
|
information up to date.
|
||||||
|
|
||||||
|
Off course the elephant in the room here is that i have also be trying to avoid making integers and
|
||||||
|
floats objects. Ie keeping their c, or machine representation, just like anyone else before me.
|
||||||
|
Too wasteful to even think otherwise.
|
||||||
|
|
||||||
|
## And a step forward
|
||||||
|
|
||||||
|
The inspiration that came by reading about the unification of integers was exactly that:
|
||||||
|
**to unify integers** . Unifying with objects, ie **making integers objects**
|
||||||
|
|
||||||
|
I have been struggling with the dichotomy between integer and objects for a long time. There always
|
||||||
|
seemed something so fundamentally wrong there. Ok, maybe if the actual hardware would do the tagging
|
||||||
|
and that continuous checking, then maybe. But otherwise: one is a direct, the other an indirect
|
||||||
|
value. It just seemed wrong.
|
||||||
|
|
||||||
|
Making Integers (and floats etc) first class citizens, objects with a type, resolves the chasm
|
||||||
|
very nicely. Off course it does so at a price, but i think it will be worth it.
|
||||||
|
|
||||||
|
## The price of Unification
|
||||||
|
|
||||||
|
Initially i wanted to make all objects the size of a cache line or multiples thereof. This is
|
||||||
|
something i'll have to let go of: Integer objects should naturally be 2 words, namely the type
|
||||||
|
and the actual value.
|
||||||
|
|
||||||
|
So this is doubling the amount of ram used to represent integers. But maybe worse, it makes them
|
||||||
|
subject to garbage collection. Both can probably be alleviated by having the first 256 pinned, ie
|
||||||
|
a fixed array, but still.
|
||||||
|
|
||||||
|
Also using a dedicated memory manager for them and keeping a pool of unused as a linked list
|
||||||
|
should make it quick. And off course the main hope lies in the fact that your average program
|
||||||
|
nowadays (especially oo) does not really use integers all that much.
|
||||||
|
|
||||||
|
## OO to the rescue
|
||||||
|
|
||||||
|
Off course this is not the first time my thought have strayed that way. There are two reasons why
|
||||||
|
they quickly scuttled back home to known territory before. The first was the automatic optimization
|
||||||
|
reflex: why use 2 words for something that can be done in one, and all that gc on top.
|
||||||
|
|
||||||
|
But the second was probably even more important: If we then have the value inside the object
|
||||||
|
(as a sort of instance variable or array element), then when return it then we have the "naked"
|
||||||
|
integer wreaking havoc in our system, as the code expects objects everywhere.
|
||||||
|
And if we don't return it, then how do operations happen, since machines only operate on values.
|
||||||
|
|
||||||
|
The thing that i had not considered is that that line of thinking is mixing up the levels
|
||||||
|
of abstraction. It assumes a lower level than one needs: What is needed is that the system
|
||||||
|
knows about integer objects (in a similar way that the other ways assumes knowledge of integer
|
||||||
|
values.)
|
||||||
|
|
||||||
|
Concretely the "machine", or compiler, needs to be able to perform the basic Integer operations,
|
||||||
|
on the Integer objects. This is really not so different from it knowing how to perform the
|
||||||
|
operations on two values. It just involves getting the actual values from the object and
|
||||||
|
putting them back.
|
||||||
|
|
||||||
|
OO helps in another way that never occurred to me. **Data hiding:** we never actually pass out
|
||||||
|
the value. The value is private to the object and not accessible from the outside. In fact it not
|
||||||
|
even accessible from the inside to the object itself. Admittedly this means more functionality in
|
||||||
|
the compiler, but since that is a solved problem (see builtin), it's ok.
|
||||||
|
|
||||||
|
## Unified method caching
|
||||||
|
|
||||||
|
So having gained this unification, we can now determine the type of an object very very easily.
|
||||||
|
The type will *always* be the first word of the memory that the object occupies. We don't have
|
||||||
|
immediate values anymore, so always is always.
|
||||||
|
|
||||||
|
This is *very* handy, since we have given up being god and thus knowing everything at any time.
|
||||||
|
In concrete terms this means that in a method, we can *not* know what type an object is.
|
||||||
|
In fact it's worse, we can't even say what type it is, even if we have checked it, but after we
|
||||||
|
have passed it as an argument to another method.
|
||||||
|
|
||||||
|
Luckily programs are not random, and it quite rare for an object to change type, and so a given
|
||||||
|
object will usually have one of a very small set of types. This can be used to do method caching.
|
||||||
|
Instead of looking up the method statically and calling it unconditionally at run-time, we will
|
||||||
|
need some kind of lookup at run-time.
|
||||||
|
|
||||||
|
The lookup tables can be objects that the method carries. A small table (3 entries) with pairs of
|
||||||
|
type vs jump address. A little assembler to go through the list and jump, or in case of a miss
|
||||||
|
jump to some handler that does a real lookup in the type.
|
||||||
|
|
||||||
|
In a distant future a smaller version may be created. For the case where the type has been
|
||||||
|
checked already during the method, a further check may be inlined completely into the code and
|
||||||
|
only revert to the table in case of a miss. But that's down the road a bit.
|
||||||
|
|
||||||
|
Next question: How does this work with Parfait. Or the interpreter??
|
14
index.html
14
index.html
@ -7,7 +7,7 @@ layout: site
|
|||||||
<div>
|
<div>
|
||||||
<p class="center">
|
<p class="center">
|
||||||
<span>
|
<span>
|
||||||
Interpreting code is like checking a map at every step: It can really slow you down.
|
Putting wings on ruby to let you fly (may take X years).
|
||||||
</span>
|
</span>
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
@ -18,12 +18,12 @@ layout: site
|
|||||||
<div class="span4">
|
<div class="span4">
|
||||||
<h2 class="center">Goal</h2>
|
<h2 class="center">Goal</h2>
|
||||||
<p>
|
<p>
|
||||||
The goal is to execute (not interpret) object oriented code without external dependencies, on modern hardware.
|
The goal is to execute (not interpret) object oriented code without external dependencies,
|
||||||
|
on modern hardware.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
This means compiling dynamic code into binary. Using several intermediate representations it
|
This means compiling dynamic code into binary. Using type knowledge at run-time we
|
||||||
is possible to keep track of type changes and switch between differently typed, but
|
optimise and cache method dispatch for know types.
|
||||||
logically equivalent, versions of methods.
|
|
||||||
|
|
||||||
As the system is 100% in ruby, the ultimate goal is to carry on the compilation at run-time,
|
As the system is 100% in ruby, the ultimate goal is to carry on the compilation at run-time,
|
||||||
ie after the program has started.
|
ie after the program has started.
|
||||||
@ -44,7 +44,7 @@ layout: site
|
|||||||
<a href="https://github.com/whitequark/parser"> ruby parser</a> to create:
|
<a href="https://github.com/whitequark/parser"> ruby parser</a> to create:
|
||||||
<ul>
|
<ul>
|
||||||
<li> An Object model of <a href="/typed/parfait.html">classes, types</a>, methods and basic types </li>
|
<li> An Object model of <a href="/typed/parfait.html">classes, types</a>, methods and basic types </li>
|
||||||
<li> Several strongly typed method versions for every ruby instance method </li>
|
<li> Methods for every type (may be several per class) </li>
|
||||||
</ul>
|
</ul>
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
@ -52,8 +52,8 @@ layout: site
|
|||||||
While it has well known typed language data semantics, it introduces several new concept:
|
While it has well known typed language data semantics, it introduces several new concept:
|
||||||
<ul>
|
<ul>
|
||||||
<li> Object based memory (no global memory) </li>
|
<li> Object based memory (no global memory) </li>
|
||||||
<li> Multiple implementations per function based on type </li>
|
|
||||||
<li> Object oriented calling semantics (not stack based) </li>
|
<li> Object oriented calling semantics (not stack based) </li>
|
||||||
|
<li> Inline method caching. </li>
|
||||||
<li> <a href="https://github.com/ruby-x/ruby/tree/master/lib/register" target="_blank">Register machine abstraction</a></li>
|
<li> <a href="https://github.com/ruby-x/ruby/tree/master/lib/register" target="_blank">Register machine abstraction</a></li>
|
||||||
<li> Extensible instruction set, with arm implementations
|
<li> Extensible instruction set, with arm implementations
|
||||||
</ul>
|
</ul>
|
||||||
|
@ -25,7 +25,7 @@ Top down the layers are:
|
|||||||
- **Melon** , compiling ruby code into typed layer and includes bootstrapping code
|
- **Melon** , compiling ruby code into typed layer and includes bootstrapping code
|
||||||
|
|
||||||
- **Typed intermediate layer:** Statically typed object oriented with object oriented
|
- **Typed intermediate layer:** Statically typed object oriented with object oriented
|
||||||
call semantics.
|
call semantics.
|
||||||
|
|
||||||
- **Risc register machine abstraction** provides a level of machine abstraction, but
|
- **Risc register machine abstraction** provides a level of machine abstraction, but
|
||||||
as the name says, quite a simple one.
|
as the name says, quite a simple one.
|
||||||
@ -40,21 +40,17 @@ a difficult task, it has already been implemented in pure ruby
|
|||||||
[here](https://github.com/whitequark/parser). The output of the parser is again
|
[here](https://github.com/whitequark/parser). The output of the parser is again
|
||||||
an ast, which needs to be compiled to the typed layer.
|
an ast, which needs to be compiled to the typed layer.
|
||||||
|
|
||||||
The dynamic aspects of ruby are actually reltively easy to handle, once the whole system is
|
The dynamic aspects of ruby are actually relatively easy to handle, once the whole system is
|
||||||
in place, because the whole system is written in ruby without external dependencies.
|
in place, because the whole system is written in ruby without external dependencies.
|
||||||
Since (when finished) it can compile ruby, it can do so to produce a binary. This binary can
|
Since (when finished) it can compile ruby, it can do so to produce a binary. This binary can
|
||||||
then contain the whole of the system, and so the resulting binary will be able to produce
|
then contain the whole of the system, and so the resulting binary will be able to produce
|
||||||
binary code when it runs. With small changes to the linking process (easy in ruby!) it can
|
binary code when it runs. With small changes to the linking process (easy in ruby!) it can
|
||||||
then extend itself.
|
then extend itself.
|
||||||
|
|
||||||
The type aspect is more tricky: Ruby is not typed and but the typed layer is after all. And
|
The type aspect is more tricky: Ruby is not typed but the typed layer is after all.
|
||||||
if everything were objects (as we like to pretend in ruby) we could just do a lot of
|
But since everything is object (yes, also integers and floats are first class citizens)
|
||||||
dynamic checking, possibly later introduce some caching. But everything is not an object,
|
we know the type on any object at any time and can check it easily.
|
||||||
minimally integers are not, but maybe also floats and other values.
|
Easy checks also make inline method jump tables relatively easy.
|
||||||
The distinction between what is an integer and what an object has sprouted an elaborate
|
|
||||||
type system, which is (by necessity) present in the typed layer.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Typed intermediate layer
|
### Typed intermediate layer
|
||||||
|
|
||||||
@ -68,26 +64,8 @@ In broad strokes it consists off:
|
|||||||
create a binary with the required information to be dynamic
|
create a binary with the required information to be dynamic
|
||||||
- **Builtin:** A very small set of primitives that are impossible to express in ruby
|
- **Builtin:** A very small set of primitives that are impossible to express in ruby
|
||||||
|
|
||||||
The idea is to have different methods for different types, but implementing the same ruby
|
|
||||||
logic. In contrast to the usual 1-1 relationship between a ruby method and it's binary
|
|
||||||
definition, there is a 1-n.
|
|
||||||
|
|
||||||
The typed layer defines the Type class and BasicTypes and also lets us return to different
|
|
||||||
places from a function. By using this, we can
|
|
||||||
compile a single ruby method into several typed functions. Each such function is typed, ie all
|
|
||||||
arguments and variables are of known type. According to these types we can call functions according
|
|
||||||
to their signatures. Also we can autognerate error methods for unhandled types, and predict
|
|
||||||
that only a fraction of the possible combinations will actually be needed.
|
|
||||||
|
|
||||||
|
|
||||||
Just to summarize a few of typed layer features that are maybe unusual:
|
|
||||||
|
|
||||||
- **Message based calling:** Calling is completely object oriented (not stack based)
|
- **Message based calling:** Calling is completely object oriented (not stack based)
|
||||||
and uses Message and Frame objects.
|
and uses Message and Frame objects.
|
||||||
- **Return addresses:** A method call may return to several addresses, according
|
|
||||||
to type, and in case of exception
|
|
||||||
- **Cross method jumps** When a type switch is detected, a method may jump into the middle
|
|
||||||
of another method.
|
|
||||||
|
|
||||||
|
|
||||||
### Register Machine
|
### Register Machine
|
||||||
|
Loading…
Reference in New Issue
Block a user