draft for new post about register allocation

2020-03-22 22:50:24 +02:00
parent 760690718a
commit ab9365f423
3 changed files with 160 additions and 0 deletions
--- a/14
+++ b/14
@ -4,3 +4,17 @@
 require_relative 'config/application'
 Rails.application.load_tasks
 task :create_post  do
  args = ARGV.dup
  args.shift
  args.each { |a| task a.to_sym do ; end }
  today = Time.now
  args.unshift today.day.to_s.rjust(2 , "0")
  args.unshift today.month.to_s.rjust(2 , "0")
  file = "app/views/posts/#{today.year}/_" + args.join("-") + ".haml"
  f = File.open(file , "w")
  f << "%p start"
 end
--- a/app/assets/images/register_alloc.png
+++ b/app/assets/images/register_alloc.png
--- a/app/views/posts/2020/_03-22-register-allocation-in-a-ruby-compiler.haml
+++ b/app/views/posts/2020/_03-22-register-allocation-in-a-ruby-compiler.haml
@ -0,0 +1,146 @@
 %p
  I had put off proper register allocation until now, and somehow the 100 commits
  it took verfies that i wasn't completely wrong. I think there are quite a
  few differences to the normal problem and solution, to warrant an explanation,
  so here it goes: How RubyX does Static Single Assignent and Register Allocation
  in a ruby compiler.
 %h2 Simple register allocation, the old way
 %p
  To understand the problem, i'll explain the previous, much simpler, approach and it's
  problems. Then i'll explain the new way, with reference to what i read the "normal"
  way is and the differences i think we have.
 %p
  Simple register allocation is quite simply hardcoding at least some register names
  into the  generation of assembler-like code, and having very simple ways to deal with
  the rest. In RubyX the assembler-like layer of the code
  is called Risc, and that is basically a simplified ARM. ARM has registers from r0
  through to r15.
 %p
  Early on i made two decisions, the current Message object would live in r0. This was
  one constant hardcoded throughout. The other descision was to design Slot level
  instructions such, that each instruction had use of all registers. To use registers
  i implemented a simple stack, but that was reset after every Slot level instruction.
 %h2 Motiviation for change
 %p
  There are two major problems with the simple solution outlined above. The first is
  sub-optimal now, the second restrictive in the future.
 %p
  The current problem, is actually many-fold. There is the obvious difficulty of keeping
  track of registers as they are used and returned. While i thought that that the by hand
  approach was not too bad, after finishing i checked the automatic way uses only half the
  aamount of registers. This is nevertheless peanuts compared to the second problem,
  which means that the code is forever locked into a subset of the SlotMachine, ie
  no optimisations can be done across SlotMachine Instruction borders. That is quite
  serious, and sort of biids in with the third problem. Namely there were some implicit
  register assumptions being made, but there were implicit, ie could not be checked
  and would only show up if broken.
 %p
  But the real motiviation for this work came from the future, or the lack of
  expandability with this approach. Basically i found from benchmarking that inlining
  would have to happen quite soon. Even it would *only* be with more Macros at first.
  Inlining with this super simple allocation would not only be super hard, but also
  much less efficient than could be. After all there are 10+ registers than one can
  keep things in, thus avoiding reloading, and i already noticed constant loading
  was a thing.
 %h2 SSA and Register Allocation
 .container
  %p.full_width
    =image_tag "register_alloc.png"
 %p
  So then we come to the way that it is coded now. The basic idea (followed by most
  compilers) is to assign new names for every new usage of a register. I'll talk
  about that first, then the second step is something called liveliness analysis,
  basically determining when register are not used anymore, and the third is the
  allocation.
 %h3 Static Single Assignment
 %p
  A
  =ext_link "Static Single Assignent form" , "https://en.wikipedia.org/wiki/Static_single_assignment_form"
  of the Instructions is one where every register
  name is assigned only once. Hence the Single Assignent. The static part means that
  this single sssignment is only true for a static analysis, so at run time the code
  may assign many times. But it would be the **same** code doing several assignemnts.
 %p
  SSA is often achieved by first naming registers according to the variables they hold,
  and to derive subsequent names by increasing an subsript index. This did not sound
  very fitting. For one, RubyX does not really have "floating" variables (that later
  may get popped on some stack), rather every variable is an instance variable.
  Assigning to a variable does not create a new register value, but needs to be stored
  in memory. In a concurrent environment it is not save to bypass that.
 %p
  New variables may be created by "traversing" into a instane of the object (if type is
  known off course). This lead to a dot syntax naming convention, where almost every
  variable starts off from *message* or as a constant, eg "message.return_value". This
  is as "single" as we need it to be (i think), and implementing this naming scheme
  was about half the work.
 %p
  The great benefit from this renaming of registers is that even the risc code is still
  quite readable, which is great for both debugging and tests. This is because
  the registers now have meaningful name (instance variable names), and it is always
  clear where it came from.
 %h3 Liveliness
 %p
  the next big step was to determine liveliness of registers. This is something i have not
  found good literature on, and in documents about Register Allocation it is often
  taken as the starting point.
 %p
  Basic reasoning lead me to believe that a simple backward scan is at least a safe
  estimate. If you imagine going trough the list of instructions and marking the
  first occurence of ay register (name) use. By going backwards through the list
  you thus get the last useage, and that is the point where we can recycle that
  register.
 %p
  I spent some time trying to figure ot if backward branches changes the fact that
  you can release the register, but could not come to a conclusion (brain melt
  every time i tried). Intuitively i think that you can, because on the first
  run through such a loop you could not use results from a register that because of
  the ssa would have had to be created later, but there you go. Even rereading that
  hurts. My final argument was that a backward jump is a while loop, and a ruby
  while loop would have to store it's data in ruby variables and not new registers
  or so i hope).
 %p
  I did read about phi nodes in ssa's and i did not implement that. Phi nodes are a way to
  ensure that different brnaches of an if produce the same registers, or the same
  registers are meaningfully filled after the merge of an if. My hope is
  that the ruby variable argument gets us out of that, and for some risc function
  i added some transfers to ensure things work as they should.
 %h3 Register Allocation
 %p
  The actual
  =ext_link "Register Allocation", "https://en.wikipedia.org/wiki/Register_allocation"
  is not substantially more complicated now, than before. But there is a good
  base now to make more analysis and optimisations.
 %p
  So basically we go through the instruction sequence and assign registers in
  order. But because of the liveliness analysis, we can release registers after their
  last use, and reuse them immediately off course. I noticed that this results in
  surprisingly many registers being used only for a single instruction.
  And the total number of registers used went **down by half**.
 %h2 The future
 %p
  As i said that this was mostly for the future, what is the future going to hold?
 %p
  Well, **inlining** is number one, that's for sure.
 %p
  But also there is something called escape analysis. This basically means reclaming
  objects that get created in a method, but never passed out. A sort of immediate GC,
  thus not only saving on GC time, but also on allocation. Mostly though it would
  allow to run larger benchmarks, because there is no GC and integers get created
  a lot before a meaningfull number of milliseconds has elapsed.
 %p
  On the register front, some low hanging fruits are redundant transfer elimination and
  double load elimination. Since methods have not grown to exhaust registers,
  unloading register  is undone and thus is looming. Which brings with it code cost
  analysis.
 %p
  So much more fun to be had!!