draft for new post about register allocation
This commit is contained in:
parent
760690718a
commit
ab9365f423
14
Rakefile
14
Rakefile
@ -4,3 +4,17 @@
|
||||
require_relative 'config/application'
|
||||
|
||||
Rails.application.load_tasks
|
||||
|
||||
task :create_post do
|
||||
args = ARGV.dup
|
||||
args.shift
|
||||
args.each { |a| task a.to_sym do ; end }
|
||||
|
||||
today = Time.now
|
||||
args.unshift today.day.to_s.rjust(2 , "0")
|
||||
args.unshift today.month.to_s.rjust(2 , "0")
|
||||
|
||||
file = "app/views/posts/#{today.year}/_" + args.join("-") + ".haml"
|
||||
f = File.open(file , "w")
|
||||
f << "%p start"
|
||||
end
|
||||
|
BIN
app/assets/images/register_alloc.png
Normal file
BIN
app/assets/images/register_alloc.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 23 KiB |
@ -0,0 +1,146 @@
|
||||
%p
|
||||
I had put off proper register allocation until now, and somehow the 100 commits
|
||||
it took verfies that i wasn't completely wrong. I think there are quite a
|
||||
few differences to the normal problem and solution, to warrant an explanation,
|
||||
so here it goes: How RubyX does Static Single Assignent and Register Allocation
|
||||
in a ruby compiler.
|
||||
|
||||
%h2 Simple register allocation, the old way
|
||||
|
||||
%p
|
||||
To understand the problem, i'll explain the previous, much simpler, approach and it's
|
||||
problems. Then i'll explain the new way, with reference to what i read the "normal"
|
||||
way is and the differences i think we have.
|
||||
|
||||
%p
|
||||
Simple register allocation is quite simply hardcoding at least some register names
|
||||
into the generation of assembler-like code, and having very simple ways to deal with
|
||||
the rest. In RubyX the assembler-like layer of the code
|
||||
is called Risc, and that is basically a simplified ARM. ARM has registers from r0
|
||||
through to r15.
|
||||
%p
|
||||
Early on i made two decisions, the current Message object would live in r0. This was
|
||||
one constant hardcoded throughout. The other descision was to design Slot level
|
||||
instructions such, that each instruction had use of all registers. To use registers
|
||||
i implemented a simple stack, but that was reset after every Slot level instruction.
|
||||
|
||||
%h2 Motiviation for change
|
||||
%p
|
||||
There are two major problems with the simple solution outlined above. The first is
|
||||
sub-optimal now, the second restrictive in the future.
|
||||
%p
|
||||
The current problem, is actually many-fold. There is the obvious difficulty of keeping
|
||||
track of registers as they are used and returned. While i thought that that the by hand
|
||||
approach was not too bad, after finishing i checked the automatic way uses only half the
|
||||
aamount of registers. This is nevertheless peanuts compared to the second problem,
|
||||
which means that the code is forever locked into a subset of the SlotMachine, ie
|
||||
no optimisations can be done across SlotMachine Instruction borders. That is quite
|
||||
serious, and sort of biids in with the third problem. Namely there were some implicit
|
||||
register assumptions being made, but there were implicit, ie could not be checked
|
||||
and would only show up if broken.
|
||||
%p
|
||||
But the real motiviation for this work came from the future, or the lack of
|
||||
expandability with this approach. Basically i found from benchmarking that inlining
|
||||
would have to happen quite soon. Even it would *only* be with more Macros at first.
|
||||
Inlining with this super simple allocation would not only be super hard, but also
|
||||
much less efficient than could be. After all there are 10+ registers than one can
|
||||
keep things in, thus avoiding reloading, and i already noticed constant loading
|
||||
was a thing.
|
||||
|
||||
%h2 SSA and Register Allocation
|
||||
.container
|
||||
%p.full_width
|
||||
=image_tag "register_alloc.png"
|
||||
%p
|
||||
So then we come to the way that it is coded now. The basic idea (followed by most
|
||||
compilers) is to assign new names for every new usage of a register. I'll talk
|
||||
about that first, then the second step is something called liveliness analysis,
|
||||
basically determining when register are not used anymore, and the third is the
|
||||
allocation.
|
||||
|
||||
%h3 Static Single Assignment
|
||||
%p
|
||||
A
|
||||
=ext_link "Static Single Assignent form" , "https://en.wikipedia.org/wiki/Static_single_assignment_form"
|
||||
of the Instructions is one where every register
|
||||
name is assigned only once. Hence the Single Assignent. The static part means that
|
||||
this single sssignment is only true for a static analysis, so at run time the code
|
||||
may assign many times. But it would be the **same** code doing several assignemnts.
|
||||
%p
|
||||
SSA is often achieved by first naming registers according to the variables they hold,
|
||||
and to derive subsequent names by increasing an subsript index. This did not sound
|
||||
very fitting. For one, RubyX does not really have "floating" variables (that later
|
||||
may get popped on some stack), rather every variable is an instance variable.
|
||||
Assigning to a variable does not create a new register value, but needs to be stored
|
||||
in memory. In a concurrent environment it is not save to bypass that.
|
||||
%p
|
||||
New variables may be created by "traversing" into a instane of the object (if type is
|
||||
known off course). This lead to a dot syntax naming convention, where almost every
|
||||
variable starts off from *message* or as a constant, eg "message.return_value". This
|
||||
is as "single" as we need it to be (i think), and implementing this naming scheme
|
||||
was about half the work.
|
||||
%p
|
||||
The great benefit from this renaming of registers is that even the risc code is still
|
||||
quite readable, which is great for both debugging and tests. This is because
|
||||
the registers now have meaningful name (instance variable names), and it is always
|
||||
clear where it came from.
|
||||
|
||||
%h3 Liveliness
|
||||
%p
|
||||
the next big step was to determine liveliness of registers. This is something i have not
|
||||
found good literature on, and in documents about Register Allocation it is often
|
||||
taken as the starting point.
|
||||
%p
|
||||
Basic reasoning lead me to believe that a simple backward scan is at least a safe
|
||||
estimate. If you imagine going trough the list of instructions and marking the
|
||||
first occurence of ay register (name) use. By going backwards through the list
|
||||
you thus get the last useage, and that is the point where we can recycle that
|
||||
register.
|
||||
%p
|
||||
I spent some time trying to figure ot if backward branches changes the fact that
|
||||
you can release the register, but could not come to a conclusion (brain melt
|
||||
every time i tried). Intuitively i think that you can, because on the first
|
||||
run through such a loop you could not use results from a register that because of
|
||||
the ssa would have had to be created later, but there you go. Even rereading that
|
||||
hurts. My final argument was that a backward jump is a while loop, and a ruby
|
||||
while loop would have to store it's data in ruby variables and not new registers
|
||||
or so i hope).
|
||||
%p
|
||||
I did read about phi nodes in ssa's and i did not implement that. Phi nodes are a way to
|
||||
ensure that different brnaches of an if produce the same registers, or the same
|
||||
registers are meaningfully filled after the merge of an if. My hope is
|
||||
that the ruby variable argument gets us out of that, and for some risc function
|
||||
i added some transfers to ensure things work as they should.
|
||||
|
||||
%h3 Register Allocation
|
||||
%p
|
||||
The actual
|
||||
=ext_link "Register Allocation", "https://en.wikipedia.org/wiki/Register_allocation"
|
||||
is not substantially more complicated now, than before. But there is a good
|
||||
base now to make more analysis and optimisations.
|
||||
%p
|
||||
|
||||
So basically we go through the instruction sequence and assign registers in
|
||||
order. But because of the liveliness analysis, we can release registers after their
|
||||
last use, and reuse them immediately off course. I noticed that this results in
|
||||
surprisingly many registers being used only for a single instruction.
|
||||
And the total number of registers used went **down by half**.
|
||||
|
||||
%h2 The future
|
||||
%p
|
||||
As i said that this was mostly for the future, what is the future going to hold?
|
||||
%p
|
||||
Well, **inlining** is number one, that's for sure.
|
||||
%p
|
||||
But also there is something called escape analysis. This basically means reclaming
|
||||
objects that get created in a method, but never passed out. A sort of immediate GC,
|
||||
thus not only saving on GC time, but also on allocation. Mostly though it would
|
||||
allow to run larger benchmarks, because there is no GC and integers get created
|
||||
a lot before a meaningfull number of milliseconds has elapsed.
|
||||
%p
|
||||
On the register front, some low hanging fruits are redundant transfer elimination and
|
||||
double load elimination. Since methods have not grown to exhaust registers,
|
||||
unloading register is undone and thus is looming. Which brings with it code cost
|
||||
analysis.
|
||||
%p
|
||||
So much more fun to be had!!
|
Loading…
x
Reference in New Issue
Block a user