123 lines
6.9 KiB
Plaintext
123 lines
6.9 KiB
Plaintext
%p
|
||
I just read mri 2.4 “unifies” Fixnum and Integer. This, it turns out, is something quite
|
||
different from what i though, mostly about which class names are returned.
|
||
And that it is ok to have two implementations for the same class, Integer.
|
||
%p
|
||
But even it wasn’t what i thought, it did spark an idea, and i hope a solution to a problem
|
||
that i have seen lurking ahead. Strangely the solution maybe even more radical than the
|
||
cross function jumps it replaces.
|
||
%h2#a-problem-lurking-ahead A problem lurking ahead
|
||
%p As i have been thinking more about what happens when a type changes, i noticed something:
|
||
%p
|
||
An object may change it’s type in one method (A), but may be used in a method (B), far up the call
|
||
stack. How does B know to treat the object different. Specifically, the calls B makes
|
||
on the object are determined by the type before the change. So they will be wrong after the change,
|
||
and so B needs to know about the type change.
|
||
%p
|
||
Such a type change was supposed to be handled by a cross method jump, thus fixing the problem
|
||
in A. But the propagation to B is cumbersome, there can be just so many of them.
|
||
Anything that i though of is quite a bit too involved. And this is before even thinking about closures.
|
||
%h2#a-step-back A step back
|
||
%p
|
||
Looking at this from a little higher vantage there are maybe one too many things i have been trying
|
||
to avoid.
|
||
%p
|
||
The first one was the bit-tagging. The ruby (and smalltalk) way of tagging an integer
|
||
with a marker bit. Thus loosing a bit and gaining a gazillion type checks. In mri c land
|
||
an object is a VALUE, and a VALUE is either a tagged integer or a pointer to an object struct.
|
||
So on
|
||
%strong every
|
||
operation the bit has to be checked. Both of these i’ve been trying to avoid.
|
||
%p
|
||
So that lead to a system with no explicit information in the lowest level representation and
|
||
thus a large dance to have that information in an external type system and keeping that type
|
||
information up to date.
|
||
%p
|
||
Off course the elephant in the room here is that i have also be trying to avoid making integers and
|
||
floats objects. Ie keeping their c, or machine representation, just like anyone else before me.
|
||
Too wasteful to even think otherwise.
|
||
%h2#and-a-step-forward And a step forward
|
||
%p
|
||
The inspiration that came by reading about the unification of integers was exactly that:
|
||
%strong to unify integers
|
||
\. Unifying with objects, ie
|
||
%strong making integers objects
|
||
%p
|
||
I have been struggling with the dichotomy between integer and objects for a long time. There always
|
||
seemed something so fundamentally wrong there. Ok, maybe if the actual hardware would do the tagging
|
||
and that continuous checking, then maybe. But otherwise: one is a direct, the other an indirect
|
||
value. It just seemed wrong.
|
||
%p
|
||
Making Integers (and floats etc) first class citizens, objects with a type, resolves the chasm
|
||
very nicely. Off course it does so at a price, but i think it will be worth it.
|
||
%h2#the-price-of-unification The price of Unification
|
||
%p
|
||
Initially i wanted to make all objects the size of a cache line or multiples thereof. This is
|
||
something i’ll have to let go of: Integer objects should naturally be 2 words, namely the type
|
||
and the actual value.
|
||
%p
|
||
So this is doubling the amount of ram used to represent integers. But maybe worse, it makes them
|
||
subject to garbage collection. Both can probably be alleviated by having the first 256 pinned, ie
|
||
a fixed array, but still.
|
||
%p
|
||
Also using a dedicated memory manager for them and keeping a pool of unused as a linked list
|
||
should make it quick. And off course the main hope lies in the fact that your average program
|
||
nowadays (especially oo) does not really use integers all that much.
|
||
%h2#oo-to-the-rescue OO to the rescue
|
||
%p
|
||
Off course this is not the first time my thought have strayed that way. There are two reasons why
|
||
they quickly scuttled back home to known territory before. The first was the automatic optimization
|
||
reflex: why use 2 words for something that can be done in one, and all that gc on top.
|
||
%p
|
||
But the second was probably even more important: If we then have the value inside the object
|
||
(as a sort of instance variable or array element), then when return it then we have the “naked”
|
||
integer wreaking havoc in our system, as the code expects objects everywhere.
|
||
And if we don’t return it, then how do operations happen, since machines only operate on values.
|
||
%p
|
||
The thing that i had not considered is that that line of thinking is mixing up the levels
|
||
of abstraction. It assumes a lower level than one needs: What is needed is that the system
|
||
knows about integer objects (in a similar way that the other ways assumes knowledge of integer
|
||
values.)
|
||
%p
|
||
Concretely the “machine”, or compiler, needs to be able to perform the basic Integer operations,
|
||
on the Integer objects. This is really not so different from it knowing how to perform the
|
||
operations on two values. It just involves getting the actual values from the object and
|
||
putting them back.
|
||
%p
|
||
OO helps in another way that never occurred to me.
|
||
%strong Data hiding:
|
||
we never actually pass out
|
||
the value. The value is private to the object and not accessible from the outside. In fact it not
|
||
even accessible from the inside to the object itself. Admittedly this means more functionality in
|
||
the compiler, but since that is a solved problem (see builtin), it’s ok.
|
||
%h2#unified-method-caching Unified method caching
|
||
%p
|
||
So having gained this unification, we can now determine the type of an object very very easily.
|
||
The type will
|
||
%em always
|
||
be the first word of the memory that the object occupies. We don’t have
|
||
immediate values anymore, so always is always.
|
||
%p
|
||
This is
|
||
%em very
|
||
handy, since we have given up being god and thus knowing everything at any time.
|
||
In concrete terms this means that in a method, we can
|
||
%em not
|
||
know what type an object is.
|
||
In fact it’s worse, we can’t even say what type it is, even if we have checked it, but after we
|
||
have passed it as an argument to another method.
|
||
%p
|
||
Luckily programs are not random, and it quite rare for an object to change type, and so a given
|
||
object will usually have one of a very small set of types. This can be used to do method caching.
|
||
Instead of looking up the method statically and calling it unconditionally at run-time, we will
|
||
need some kind of lookup at run-time.
|
||
%p
|
||
The lookup tables can be objects that the method carries. A small table (3 entries) with pairs of
|
||
type vs jump address. A little assembler to go through the list and jump, or in case of a miss
|
||
jump to some handler that does a real lookup in the type.
|
||
%p
|
||
In a distant future a smaller version may be created. For the case where the type has been
|
||
checked already during the method, a further check may be inlined completely into the code and
|
||
only revert to the table in case of a miss. But that’s down the road a bit.
|
||
%p Next question: How does this work with Parfait. Or the interpreter??
|