diff --git a/_posts/2017-01-10-integer-unification.md b/_posts/2017-01-10-integer-unification.md new file mode 100644 index 0000000..68d6fbb --- /dev/null +++ b/_posts/2017-01-10-integer-unification.md @@ -0,0 +1,122 @@ +--- +layout: news +author: Torsten +--- + +I just read mri 2.4 "unifies" Fixnum and Integer. This, it turns out, is something quite +different from what i though, mostly about which class names are returned. +And that it is ok to have two implementations for the same class, Integer. + +But even it wasn't what i thought, it did spark an idea, and i hope a solution to a problem +that i have seen lurking ahead. Strangely the solution maybe even more radical than the +cross function jumps it replaces. + +## A problem lurking ahead + +As i have been thinking more about what happens when a type changes, i noticed something: + +An object may change it's type in one method (A), but may be used in a method (B), far up the call +stack. How does B know to treat the object different. Specifically, the calls B makes +on the object are determined by the type before the change. So they will be wrong after the change, +and so B needs to know about the type change. + +Such a type change was supposed to be handled by a cross method jump, thus fixing the problem +in A. But the propagation to B is cumbersome, there can be just so many of them. +Anything that i though of is quite a bit too involved. And this is before even thinking about closures. + +## A step back + +Looking at this from a little higher vantage there are maybe one too many things i have been trying +to avoid. + +The first one was the bit-tagging. The ruby (and smalltalk) way of tagging an integer +with a marker bit. Thus loosing a bit and gaining a gazillion type checks. In mri c land +an object is a VALUE, and a VALUE is either a tagged integer or a pointer to an object struct. +So on **every** operation the bit has to be checked. Both of these i've been trying to avoid. + +So that lead to a system with no explicit information in the lowest level representation and +thus a large dance to have that information in an external type system and keeping that type +information up to date. + +Off course the elephant in the room here is that i have also be trying to avoid making integers and +floats objects. Ie keeping their c, or machine representation, just like anyone else before me. +Too wasteful to even think otherwise. + +## And a step forward + +The inspiration that came by reading about the unification of integers was exactly that: +**to unify integers** . Unifying with objects, ie **making integers objects** + +I have been struggling with the dichotomy between integer and objects for a long time. There always +seemed something so fundamentally wrong there. Ok, maybe if the actual hardware would do the tagging +and that continuous checking, then maybe. But otherwise: one is a direct, the other an indirect +value. It just seemed wrong. + +Making Integers (and floats etc) first class citizens, objects with a type, resolves the chasm +very nicely. Off course it does so at a price, but i think it will be worth it. + +## The price of Unification + +Initially i wanted to make all objects the size of a cache line or multiples thereof. This is +something i'll have to let go of: Integer objects should naturally be 2 words, namely the type +and the actual value. + +So this is doubling the amount of ram used to represent integers. But maybe worse, it makes them +subject to garbage collection. Both can probably be alleviated by having the first 256 pinned, ie +a fixed array, but still. + +Also using a dedicated memory manager for them and keeping a pool of unused as a linked list +should make it quick. And off course the main hope lies in the fact that your average program +nowadays (especially oo) does not really use integers all that much. + +## OO to the rescue + +Off course this is not the first time my thought have strayed that way. There are two reasons why +they quickly scuttled back home to known territory before. The first was the automatic optimization +reflex: why use 2 words for something that can be done in one, and all that gc on top. + +But the second was probably even more important: If we then have the value inside the object +(as a sort of instance variable or array element), then when return it then we have the "naked" +integer wreaking havoc in our system, as the code expects objects everywhere. +And if we don't return it, then how do operations happen, since machines only operate on values. + +The thing that i had not considered is that that line of thinking is mixing up the levels +of abstraction. It assumes a lower level than one needs: What is needed is that the system +knows about integer objects (in a similar way that the other ways assumes knowledge of integer +values.) + +Concretely the "machine", or compiler, needs to be able to perform the basic Integer operations, +on the Integer objects. This is really not so different from it knowing how to perform the +operations on two values. It just involves getting the actual values from the object and +putting them back. + +OO helps in another way that never occurred to me. **Data hiding:** we never actually pass out +the value. The value is private to the object and not accessible from the outside. In fact it not +even accessible from the inside to the object itself. Admittedly this means more functionality in +the compiler, but since that is a solved problem (see builtin), it's ok. + +## Unified method caching + +So having gained this unification, we can now determine the type of an object very very easily. +The type will *always* be the first word of the memory that the object occupies. We don't have +immediate values anymore, so always is always. + +This is *very* handy, since we have given up being god and thus knowing everything at any time. +In concrete terms this means that in a method, we can *not* know what type an object is. +In fact it's worse, we can't even say what type it is, even if we have checked it, but after we +have passed it as an argument to another method. + +Luckily programs are not random, and it quite rare for an object to change type, and so a given +object will usually have one of a very small set of types. This can be used to do method caching. +Instead of looking up the method statically and calling it unconditionally at run-time, we will +need some kind of lookup at run-time. + +The lookup tables can be objects that the method carries. A small table (3 entries) with pairs of +type vs jump address. A little assembler to go through the list and jump, or in case of a miss +jump to some handler that does a real lookup in the type. + +In a distant future a smaller version may be created. For the case where the type has been +checked already during the method, a further check may be inlined completely into the code and +only revert to the table in case of a miss. But that's down the road a bit. + +Next question: How does this work with Parfait. Or the interpreter?? diff --git a/index.html b/index.html index ae510d6..7cef7a3 100755 --- a/index.html +++ b/index.html @@ -7,7 +7,7 @@ layout: site
- Interpreting code is like checking a map at every step: It can really slow you down. + Putting wings on ruby to let you fly (may take X years).
- The goal is to execute (not interpret) object oriented code without external dependencies, on modern hardware. + The goal is to execute (not interpret) object oriented code without external dependencies, + on modern hardware.
- This means compiling dynamic code into binary. Using several intermediate representations it - is possible to keep track of type changes and switch between differently typed, but - logically equivalent, versions of methods. + This means compiling dynamic code into binary. Using type knowledge at run-time we + optimise and cache method dispatch for know types. As the system is 100% in ruby, the ultimate goal is to carry on the compilation at run-time, ie after the program has started. @@ -44,7 +44,7 @@ layout: site ruby parser to create:
@@ -52,8 +52,8 @@ layout: site While it has well known typed language data semantics, it introduces several new concept: