draft for new post about register allocation

This commit is contained in:
Torsten 2020-03-22 22:50:24 +02:00
parent 760690718a
commit ab9365f423
3 changed files with 160 additions and 0 deletions

View File

@ -4,3 +4,17 @@
require_relative 'config/application'
Rails.application.load_tasks
task :create_post do
args = ARGV.dup
args.shift
args.each { |a| task a.to_sym do ; end }
today = Time.now
args.unshift today.day.to_s.rjust(2 , "0")
args.unshift today.month.to_s.rjust(2 , "0")
file = "app/views/posts/#{today.year}/_" + args.join("-") + ".haml"
f = File.open(file , "w")
f << "%p start"
end

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

View File

@ -0,0 +1,146 @@
%p
I had put off proper register allocation until now, and somehow the 100 commits
it took verfies that i wasn't completely wrong. I think there are quite a
few differences to the normal problem and solution, to warrant an explanation,
so here it goes: How RubyX does Static Single Assignent and Register Allocation
in a ruby compiler.
%h2 Simple register allocation, the old way
%p
To understand the problem, i'll explain the previous, much simpler, approach and it's
problems. Then i'll explain the new way, with reference to what i read the "normal"
way is and the differences i think we have.
%p
Simple register allocation is quite simply hardcoding at least some register names
into the generation of assembler-like code, and having very simple ways to deal with
the rest. In RubyX the assembler-like layer of the code
is called Risc, and that is basically a simplified ARM. ARM has registers from r0
through to r15.
%p
Early on i made two decisions, the current Message object would live in r0. This was
one constant hardcoded throughout. The other descision was to design Slot level
instructions such, that each instruction had use of all registers. To use registers
i implemented a simple stack, but that was reset after every Slot level instruction.
%h2 Motiviation for change
%p
There are two major problems with the simple solution outlined above. The first is
sub-optimal now, the second restrictive in the future.
%p
The current problem, is actually many-fold. There is the obvious difficulty of keeping
track of registers as they are used and returned. While i thought that that the by hand
approach was not too bad, after finishing i checked the automatic way uses only half the
aamount of registers. This is nevertheless peanuts compared to the second problem,
which means that the code is forever locked into a subset of the SlotMachine, ie
no optimisations can be done across SlotMachine Instruction borders. That is quite
serious, and sort of biids in with the third problem. Namely there were some implicit
register assumptions being made, but there were implicit, ie could not be checked
and would only show up if broken.
%p
But the real motiviation for this work came from the future, or the lack of
expandability with this approach. Basically i found from benchmarking that inlining
would have to happen quite soon. Even it would *only* be with more Macros at first.
Inlining with this super simple allocation would not only be super hard, but also
much less efficient than could be. After all there are 10+ registers than one can
keep things in, thus avoiding reloading, and i already noticed constant loading
was a thing.
%h2 SSA and Register Allocation
.container
%p.full_width
=image_tag "register_alloc.png"
%p
So then we come to the way that it is coded now. The basic idea (followed by most
compilers) is to assign new names for every new usage of a register. I'll talk
about that first, then the second step is something called liveliness analysis,
basically determining when register are not used anymore, and the third is the
allocation.
%h3 Static Single Assignment
%p
A
=ext_link "Static Single Assignent form" , "https://en.wikipedia.org/wiki/Static_single_assignment_form"
of the Instructions is one where every register
name is assigned only once. Hence the Single Assignent. The static part means that
this single sssignment is only true for a static analysis, so at run time the code
may assign many times. But it would be the **same** code doing several assignemnts.
%p
SSA is often achieved by first naming registers according to the variables they hold,
and to derive subsequent names by increasing an subsript index. This did not sound
very fitting. For one, RubyX does not really have "floating" variables (that later
may get popped on some stack), rather every variable is an instance variable.
Assigning to a variable does not create a new register value, but needs to be stored
in memory. In a concurrent environment it is not save to bypass that.
%p
New variables may be created by "traversing" into a instane of the object (if type is
known off course). This lead to a dot syntax naming convention, where almost every
variable starts off from *message* or as a constant, eg "message.return_value". This
is as "single" as we need it to be (i think), and implementing this naming scheme
was about half the work.
%p
The great benefit from this renaming of registers is that even the risc code is still
quite readable, which is great for both debugging and tests. This is because
the registers now have meaningful name (instance variable names), and it is always
clear where it came from.
%h3 Liveliness
%p
the next big step was to determine liveliness of registers. This is something i have not
found good literature on, and in documents about Register Allocation it is often
taken as the starting point.
%p
Basic reasoning lead me to believe that a simple backward scan is at least a safe
estimate. If you imagine going trough the list of instructions and marking the
first occurence of ay register (name) use. By going backwards through the list
you thus get the last useage, and that is the point where we can recycle that
register.
%p
I spent some time trying to figure ot if backward branches changes the fact that
you can release the register, but could not come to a conclusion (brain melt
every time i tried). Intuitively i think that you can, because on the first
run through such a loop you could not use results from a register that because of
the ssa would have had to be created later, but there you go. Even rereading that
hurts. My final argument was that a backward jump is a while loop, and a ruby
while loop would have to store it's data in ruby variables and not new registers
or so i hope).
%p
I did read about phi nodes in ssa's and i did not implement that. Phi nodes are a way to
ensure that different brnaches of an if produce the same registers, or the same
registers are meaningfully filled after the merge of an if. My hope is
that the ruby variable argument gets us out of that, and for some risc function
i added some transfers to ensure things work as they should.
%h3 Register Allocation
%p
The actual
=ext_link "Register Allocation", "https://en.wikipedia.org/wiki/Register_allocation"
is not substantially more complicated now, than before. But there is a good
base now to make more analysis and optimisations.
%p
So basically we go through the instruction sequence and assign registers in
order. But because of the liveliness analysis, we can release registers after their
last use, and reuse them immediately off course. I noticed that this results in
surprisingly many registers being used only for a single instruction.
And the total number of registers used went **down by half**.
%h2 The future
%p
As i said that this was mostly for the future, what is the future going to hold?
%p
Well, **inlining** is number one, that's for sure.
%p
But also there is something called escape analysis. This basically means reclaming
objects that get created in a method, but never passed out. A sort of immediate GC,
thus not only saving on GC time, but also on allocation. Mostly though it would
allow to run larger benchmarks, because there is no GC and integers get created
a lot before a meaningfull number of milliseconds has elapsed.
%p
On the register front, some low hanging fruits are redundant transfer elimination and
double load elimination. Since methods have not grown to exhaust registers,
unloading register is undone and thus is looming. Which brings with it code cost
analysis.
%p
So much more fun to be had!!