163 lines
8.0 KiB
Plaintext
163 lines
8.0 KiB
Plaintext
%p
|
|
I had put off proper register allocation until now, and somehow the 100 commits
|
|
it took verfies that i wasn't completely wrong. I think there are quite a
|
|
few differences to the normal problem and solution, to warrant an explanation,
|
|
so here it goes: How RubyX does Static Single Assignment and Register Allocation
|
|
in a ruby compiler.
|
|
|
|
%h2 Simple register allocation, the old way
|
|
|
|
%p
|
|
To understand the problem, i'll explain the previous, much simpler, approach and it's
|
|
problems. Then i'll explain the new way, with reference to what i read the "normal"
|
|
way is and the differences i think we have.
|
|
|
|
%p
|
|
Simple register allocation is quite simply hardcoding at least some register names
|
|
into the generation of assembler-like code, and having very simple ways to deal with
|
|
the rest. In RubyX the assembler-like layer of the code
|
|
is called Risc, and that is basically a simplified ARM. ARM has registers from r0
|
|
through to r15.
|
|
%p
|
|
Early on i made two decisions, the current Message object would live in r0. This was
|
|
one constant hardcoded throughout. The other decision was to design Slot level
|
|
instructions such, that each instruction had use of all registers. To use registers
|
|
i implemented a simple stack, but that was reset after every Slot level instruction.
|
|
|
|
%h2 Motivation for change
|
|
%p
|
|
There are two major problems with the simple solution outlined above. The first is
|
|
that it is sub-optimal now, the second that it is restrictive in the future.
|
|
%p
|
|
The current problem, is actually many-fold. There is the obvious difficulty of keeping
|
|
track of registers as they are used and returned. While i thought that the
|
|
%i by hand
|
|
approach was not too bad, after finishing this work, i checked the automatic way uses
|
|
only half the amount of registers. This is nevertheless peanuts compared to the second
|
|
problem, which means that the code is forever locked into a subset of the SlotMachine,
|
|
ie no optimisations can be done across SlotMachine Instruction borders. That is quite
|
|
serious, and sort of binds in with the third problem. Namely there were some implicit
|
|
register assumptions being made, but there were implicit, ie could not be checked
|
|
and would only show up if broken.
|
|
%p
|
|
But the real motivation for this work came from the future, or the lack of
|
|
expandability with this approach. Basically i found from benchmarking that inlining
|
|
would have to happen quite soon. Even it would
|
|
%em only
|
|
be with more Macros at first.
|
|
Inlining with this super simple allocation would not only be super hard, but also
|
|
much less efficient than could be. After all there are 10+ registers that one can
|
|
keep things in, thus avoiding reloading, and i already noticed constant loading
|
|
was a thing.
|
|
|
|
%h2 SSA and Register Allocation
|
|
|
|
%p
|
|
So then we come to the way that it is coded now. The basic idea (followed by most
|
|
compilers) is to assign new names for every new usage of a register. I'll talk
|
|
about that first. Then the second step is something called liveliness analysis;
|
|
basically determining when register are not used anymore, and the third is the
|
|
allocation.
|
|
|
|
%h3 Static Single Assignment
|
|
%p
|
|
A
|
|
=ext_link "Static Single Assignment form" , "https://en.wikipedia.org/wiki/Static_single_assignment_form"
|
|
of the Instructions is one where every register
|
|
name is assigned only once. Hence the
|
|
%em Single
|
|
Assignment. The static part means that this single assignment is only true for a static
|
|
analysis, so at run time the code may assign many times.
|
|
But it would be the
|
|
%em same
|
|
code doing several assignments.
|
|
%p
|
|
SSA is often achieved by first naming registers according to the variables they hold,
|
|
and to derive subsequent names by increasing an subscript index. This did not sound
|
|
very fitting. For one, RubyX does not really have "floating" variables (that later
|
|
may get popped on some stack), rather every variable is an instance variable.
|
|
Assigning to a variable does not create a new register value, but needs to be stored
|
|
in memory. In a concurrent environment it is not safe to bypass that.
|
|
%p
|
|
New variables may be created by "traversing" into a instance of the object (if type is
|
|
known off course). This lead to a dot syntax naming convention, where almost every
|
|
variable starts off from
|
|
%b message
|
|
or as a constant, eg "message.return_value". This
|
|
is as "single" as we need it to be (i think), and implementing this naming scheme
|
|
was about half the work (more than half of that in tests, where register names
|
|
were and are checked).
|
|
%p
|
|
The great benefit from this renaming of registers is that even the risc code is now
|
|
quite readable, which is great for debugging and tests. This is because
|
|
the registers now have meaningful names (instance variable names), and it is always
|
|
clear what a register is used for.
|
|
|
|
%h3 Liveliness
|
|
%p
|
|
The next big step was to determine liveliness of registers. This is something i have not
|
|
found good literature on, and in documents about Register Allocation it is often
|
|
taken as the starting point.
|
|
%p
|
|
Basic reasoning lead me to believe that a simple backward scan is at least a safe
|
|
estimate. If you imagine going trough the list of instructions and marking the
|
|
first occurence of any register use. By going backwards through the list
|
|
you thus get the last usage, and that is the point where we can recycle that
|
|
register.
|
|
%p
|
|
I spent some time trying to figure out if backward branches changes the fact that
|
|
you can release the register, but could not come to a conclusion (brain melt
|
|
every time i tried). Intuitively i think that you can, because on the first
|
|
run through such a loop you could not use results from a register that, because of
|
|
the ssa, would have had to be created later, but there you go. Even rereading that
|
|
hurts. My final argument was that a backward jump is a while loop, and a ruby
|
|
while loop would have to store its data in ruby variables and not new registers,
|
|
or so i hope).
|
|
%p
|
|
I did read about phi nodes in ssa and i did not implement that. Phi nodes are a way to
|
|
ensure that different branches of an
|
|
%b if
|
|
produce the same registers, or the same
|
|
registers are meaningfuly filled after the merge of an
|
|
%b if.
|
|
My hope is that the ruby variable argument from above gets us out of that, and for some
|
|
risc functions i added some transfers to ensure things work as they should.
|
|
|
|
%h3 Register Allocation
|
|
%p
|
|
The actual
|
|
=ext_link "Register Allocation", "https://en.wikipedia.org/wiki/Register_allocation"
|
|
is not substantially more complicated now, than before. But there is a good
|
|
base now to make more analysis and optimisations.
|
|
%p
|
|
|
|
So basically we go through the instruction sequence and assign registers in
|
|
order. But because of the liveliness analysis, we can release registers after their
|
|
last use, and reuse them immediately off course. I noticed that this results in
|
|
surprisingly many registers being used only for a single instruction.
|
|
And the total number of registers used went
|
|
%b down by half.
|
|
|
|
%h2 The future
|
|
%p
|
|
As i said that this was mostly for the future, what is the future going to hold?
|
|
Well,
|
|
%b inlining
|
|
is high up, that's for sure.
|
|
%p
|
|
But also there is something called escape analysis. This essentially means reclaiming
|
|
objects that are created in a method, but never get passed out. A sort of immediate GC,
|
|
thus not only saving on GC time, but also on allocation. Mostly though it would
|
|
allow to run larger benchmarks, because there is no GC, and integers get created
|
|
a lot before a meaningful number of milliseconds has elapsed.
|
|
%p
|
|
On the register front, some low hanging fruits are redundant transfer elimination and
|
|
double load elimination. Since methods have not grown to exhaust registers,
|
|
unloading register is undone and thus is looming. Which brings with it code cost
|
|
analysis. So much more fun to be had!!
|
|
%p
|
|
I am happy to announce that RubyX is part of
|
|
=ext_link "Rails Girls Summer of Code" , "https://railsgirlssummerofcode.org/"
|
|
and some interest is being show. Since i have enjoyed my last RGSoC summer,
|
|
i am looking forward to some mentoring, and outside participation.
|