draft for new post about register allocation
This commit is contained in:
parent
760690718a
commit
ab9365f423
14
Rakefile
14
Rakefile
@ -4,3 +4,17 @@
|
|||||||
require_relative 'config/application'
|
require_relative 'config/application'
|
||||||
|
|
||||||
Rails.application.load_tasks
|
Rails.application.load_tasks
|
||||||
|
|
||||||
|
task :create_post do
|
||||||
|
args = ARGV.dup
|
||||||
|
args.shift
|
||||||
|
args.each { |a| task a.to_sym do ; end }
|
||||||
|
|
||||||
|
today = Time.now
|
||||||
|
args.unshift today.day.to_s.rjust(2 , "0")
|
||||||
|
args.unshift today.month.to_s.rjust(2 , "0")
|
||||||
|
|
||||||
|
file = "app/views/posts/#{today.year}/_" + args.join("-") + ".haml"
|
||||||
|
f = File.open(file , "w")
|
||||||
|
f << "%p start"
|
||||||
|
end
|
||||||
|
BIN
app/assets/images/register_alloc.png
Normal file
BIN
app/assets/images/register_alloc.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 23 KiB |
@ -0,0 +1,146 @@
|
|||||||
|
%p
|
||||||
|
I had put off proper register allocation until now, and somehow the 100 commits
|
||||||
|
it took verfies that i wasn't completely wrong. I think there are quite a
|
||||||
|
few differences to the normal problem and solution, to warrant an explanation,
|
||||||
|
so here it goes: How RubyX does Static Single Assignent and Register Allocation
|
||||||
|
in a ruby compiler.
|
||||||
|
|
||||||
|
%h2 Simple register allocation, the old way
|
||||||
|
|
||||||
|
%p
|
||||||
|
To understand the problem, i'll explain the previous, much simpler, approach and it's
|
||||||
|
problems. Then i'll explain the new way, with reference to what i read the "normal"
|
||||||
|
way is and the differences i think we have.
|
||||||
|
|
||||||
|
%p
|
||||||
|
Simple register allocation is quite simply hardcoding at least some register names
|
||||||
|
into the generation of assembler-like code, and having very simple ways to deal with
|
||||||
|
the rest. In RubyX the assembler-like layer of the code
|
||||||
|
is called Risc, and that is basically a simplified ARM. ARM has registers from r0
|
||||||
|
through to r15.
|
||||||
|
%p
|
||||||
|
Early on i made two decisions, the current Message object would live in r0. This was
|
||||||
|
one constant hardcoded throughout. The other descision was to design Slot level
|
||||||
|
instructions such, that each instruction had use of all registers. To use registers
|
||||||
|
i implemented a simple stack, but that was reset after every Slot level instruction.
|
||||||
|
|
||||||
|
%h2 Motiviation for change
|
||||||
|
%p
|
||||||
|
There are two major problems with the simple solution outlined above. The first is
|
||||||
|
sub-optimal now, the second restrictive in the future.
|
||||||
|
%p
|
||||||
|
The current problem, is actually many-fold. There is the obvious difficulty of keeping
|
||||||
|
track of registers as they are used and returned. While i thought that that the by hand
|
||||||
|
approach was not too bad, after finishing i checked the automatic way uses only half the
|
||||||
|
aamount of registers. This is nevertheless peanuts compared to the second problem,
|
||||||
|
which means that the code is forever locked into a subset of the SlotMachine, ie
|
||||||
|
no optimisations can be done across SlotMachine Instruction borders. That is quite
|
||||||
|
serious, and sort of biids in with the third problem. Namely there were some implicit
|
||||||
|
register assumptions being made, but there were implicit, ie could not be checked
|
||||||
|
and would only show up if broken.
|
||||||
|
%p
|
||||||
|
But the real motiviation for this work came from the future, or the lack of
|
||||||
|
expandability with this approach. Basically i found from benchmarking that inlining
|
||||||
|
would have to happen quite soon. Even it would *only* be with more Macros at first.
|
||||||
|
Inlining with this super simple allocation would not only be super hard, but also
|
||||||
|
much less efficient than could be. After all there are 10+ registers than one can
|
||||||
|
keep things in, thus avoiding reloading, and i already noticed constant loading
|
||||||
|
was a thing.
|
||||||
|
|
||||||
|
%h2 SSA and Register Allocation
|
||||||
|
.container
|
||||||
|
%p.full_width
|
||||||
|
=image_tag "register_alloc.png"
|
||||||
|
%p
|
||||||
|
So then we come to the way that it is coded now. The basic idea (followed by most
|
||||||
|
compilers) is to assign new names for every new usage of a register. I'll talk
|
||||||
|
about that first, then the second step is something called liveliness analysis,
|
||||||
|
basically determining when register are not used anymore, and the third is the
|
||||||
|
allocation.
|
||||||
|
|
||||||
|
%h3 Static Single Assignment
|
||||||
|
%p
|
||||||
|
A
|
||||||
|
=ext_link "Static Single Assignent form" , "https://en.wikipedia.org/wiki/Static_single_assignment_form"
|
||||||
|
of the Instructions is one where every register
|
||||||
|
name is assigned only once. Hence the Single Assignent. The static part means that
|
||||||
|
this single sssignment is only true for a static analysis, so at run time the code
|
||||||
|
may assign many times. But it would be the **same** code doing several assignemnts.
|
||||||
|
%p
|
||||||
|
SSA is often achieved by first naming registers according to the variables they hold,
|
||||||
|
and to derive subsequent names by increasing an subsript index. This did not sound
|
||||||
|
very fitting. For one, RubyX does not really have "floating" variables (that later
|
||||||
|
may get popped on some stack), rather every variable is an instance variable.
|
||||||
|
Assigning to a variable does not create a new register value, but needs to be stored
|
||||||
|
in memory. In a concurrent environment it is not save to bypass that.
|
||||||
|
%p
|
||||||
|
New variables may be created by "traversing" into a instane of the object (if type is
|
||||||
|
known off course). This lead to a dot syntax naming convention, where almost every
|
||||||
|
variable starts off from *message* or as a constant, eg "message.return_value". This
|
||||||
|
is as "single" as we need it to be (i think), and implementing this naming scheme
|
||||||
|
was about half the work.
|
||||||
|
%p
|
||||||
|
The great benefit from this renaming of registers is that even the risc code is still
|
||||||
|
quite readable, which is great for both debugging and tests. This is because
|
||||||
|
the registers now have meaningful name (instance variable names), and it is always
|
||||||
|
clear where it came from.
|
||||||
|
|
||||||
|
%h3 Liveliness
|
||||||
|
%p
|
||||||
|
the next big step was to determine liveliness of registers. This is something i have not
|
||||||
|
found good literature on, and in documents about Register Allocation it is often
|
||||||
|
taken as the starting point.
|
||||||
|
%p
|
||||||
|
Basic reasoning lead me to believe that a simple backward scan is at least a safe
|
||||||
|
estimate. If you imagine going trough the list of instructions and marking the
|
||||||
|
first occurence of ay register (name) use. By going backwards through the list
|
||||||
|
you thus get the last useage, and that is the point where we can recycle that
|
||||||
|
register.
|
||||||
|
%p
|
||||||
|
I spent some time trying to figure ot if backward branches changes the fact that
|
||||||
|
you can release the register, but could not come to a conclusion (brain melt
|
||||||
|
every time i tried). Intuitively i think that you can, because on the first
|
||||||
|
run through such a loop you could not use results from a register that because of
|
||||||
|
the ssa would have had to be created later, but there you go. Even rereading that
|
||||||
|
hurts. My final argument was that a backward jump is a while loop, and a ruby
|
||||||
|
while loop would have to store it's data in ruby variables and not new registers
|
||||||
|
or so i hope).
|
||||||
|
%p
|
||||||
|
I did read about phi nodes in ssa's and i did not implement that. Phi nodes are a way to
|
||||||
|
ensure that different brnaches of an if produce the same registers, or the same
|
||||||
|
registers are meaningfully filled after the merge of an if. My hope is
|
||||||
|
that the ruby variable argument gets us out of that, and for some risc function
|
||||||
|
i added some transfers to ensure things work as they should.
|
||||||
|
|
||||||
|
%h3 Register Allocation
|
||||||
|
%p
|
||||||
|
The actual
|
||||||
|
=ext_link "Register Allocation", "https://en.wikipedia.org/wiki/Register_allocation"
|
||||||
|
is not substantially more complicated now, than before. But there is a good
|
||||||
|
base now to make more analysis and optimisations.
|
||||||
|
%p
|
||||||
|
|
||||||
|
So basically we go through the instruction sequence and assign registers in
|
||||||
|
order. But because of the liveliness analysis, we can release registers after their
|
||||||
|
last use, and reuse them immediately off course. I noticed that this results in
|
||||||
|
surprisingly many registers being used only for a single instruction.
|
||||||
|
And the total number of registers used went **down by half**.
|
||||||
|
|
||||||
|
%h2 The future
|
||||||
|
%p
|
||||||
|
As i said that this was mostly for the future, what is the future going to hold?
|
||||||
|
%p
|
||||||
|
Well, **inlining** is number one, that's for sure.
|
||||||
|
%p
|
||||||
|
But also there is something called escape analysis. This basically means reclaming
|
||||||
|
objects that get created in a method, but never passed out. A sort of immediate GC,
|
||||||
|
thus not only saving on GC time, but also on allocation. Mostly though it would
|
||||||
|
allow to run larger benchmarks, because there is no GC and integers get created
|
||||||
|
a lot before a meaningfull number of milliseconds has elapsed.
|
||||||
|
%p
|
||||||
|
On the register front, some low hanging fruits are redundant transfer elimination and
|
||||||
|
double load elimination. Since methods have not grown to exhaust registers,
|
||||||
|
unloading register is undone and thus is looming. Which brings with it code cost
|
||||||
|
analysis.
|
||||||
|
%p
|
||||||
|
So much more fun to be had!!
|
Loading…
x
Reference in New Issue
Block a user