diff --git a/Rakefile b/Rakefile index e85f913..1bbf23b 100644 --- a/Rakefile +++ b/Rakefile @@ -4,3 +4,17 @@ require_relative 'config/application' Rails.application.load_tasks + +task :create_post do + args = ARGV.dup + args.shift + args.each { |a| task a.to_sym do ; end } + + today = Time.now + args.unshift today.day.to_s.rjust(2 , "0") + args.unshift today.month.to_s.rjust(2 , "0") + + file = "app/views/posts/#{today.year}/_" + args.join("-") + ".haml" + f = File.open(file , "w") + f << "%p start" +end diff --git a/app/assets/images/register_alloc.png b/app/assets/images/register_alloc.png new file mode 100644 index 0000000..8f0e1a9 Binary files /dev/null and b/app/assets/images/register_alloc.png differ diff --git a/app/views/posts/2020/_03-22-register-allocation-in-a-ruby-compiler.haml b/app/views/posts/2020/_03-22-register-allocation-in-a-ruby-compiler.haml new file mode 100644 index 0000000..76210a6 --- /dev/null +++ b/app/views/posts/2020/_03-22-register-allocation-in-a-ruby-compiler.haml @@ -0,0 +1,146 @@ +%p + I had put off proper register allocation until now, and somehow the 100 commits + it took verfies that i wasn't completely wrong. I think there are quite a + few differences to the normal problem and solution, to warrant an explanation, + so here it goes: How RubyX does Static Single Assignent and Register Allocation + in a ruby compiler. + +%h2 Simple register allocation, the old way + +%p + To understand the problem, i'll explain the previous, much simpler, approach and it's + problems. Then i'll explain the new way, with reference to what i read the "normal" + way is and the differences i think we have. + +%p + Simple register allocation is quite simply hardcoding at least some register names + into the generation of assembler-like code, and having very simple ways to deal with + the rest. In RubyX the assembler-like layer of the code + is called Risc, and that is basically a simplified ARM. ARM has registers from r0 + through to r15. +%p + Early on i made two decisions, the current Message object would live in r0. This was + one constant hardcoded throughout. The other descision was to design Slot level + instructions such, that each instruction had use of all registers. To use registers + i implemented a simple stack, but that was reset after every Slot level instruction. + +%h2 Motiviation for change +%p + There are two major problems with the simple solution outlined above. The first is + sub-optimal now, the second restrictive in the future. +%p + The current problem, is actually many-fold. There is the obvious difficulty of keeping + track of registers as they are used and returned. While i thought that that the by hand + approach was not too bad, after finishing i checked the automatic way uses only half the + aamount of registers. This is nevertheless peanuts compared to the second problem, + which means that the code is forever locked into a subset of the SlotMachine, ie + no optimisations can be done across SlotMachine Instruction borders. That is quite + serious, and sort of biids in with the third problem. Namely there were some implicit + register assumptions being made, but there were implicit, ie could not be checked + and would only show up if broken. +%p + But the real motiviation for this work came from the future, or the lack of + expandability with this approach. Basically i found from benchmarking that inlining + would have to happen quite soon. Even it would *only* be with more Macros at first. + Inlining with this super simple allocation would not only be super hard, but also + much less efficient than could be. After all there are 10+ registers than one can + keep things in, thus avoiding reloading, and i already noticed constant loading + was a thing. + +%h2 SSA and Register Allocation +.container + %p.full_width + =image_tag "register_alloc.png" +%p + So then we come to the way that it is coded now. The basic idea (followed by most + compilers) is to assign new names for every new usage of a register. I'll talk + about that first, then the second step is something called liveliness analysis, + basically determining when register are not used anymore, and the third is the + allocation. + +%h3 Static Single Assignment +%p + A + =ext_link "Static Single Assignent form" , "https://en.wikipedia.org/wiki/Static_single_assignment_form" + of the Instructions is one where every register + name is assigned only once. Hence the Single Assignent. The static part means that + this single sssignment is only true for a static analysis, so at run time the code + may assign many times. But it would be the **same** code doing several assignemnts. +%p + SSA is often achieved by first naming registers according to the variables they hold, + and to derive subsequent names by increasing an subsript index. This did not sound + very fitting. For one, RubyX does not really have "floating" variables (that later + may get popped on some stack), rather every variable is an instance variable. + Assigning to a variable does not create a new register value, but needs to be stored + in memory. In a concurrent environment it is not save to bypass that. +%p + New variables may be created by "traversing" into a instane of the object (if type is + known off course). This lead to a dot syntax naming convention, where almost every + variable starts off from *message* or as a constant, eg "message.return_value". This + is as "single" as we need it to be (i think), and implementing this naming scheme + was about half the work. +%p + The great benefit from this renaming of registers is that even the risc code is still + quite readable, which is great for both debugging and tests. This is because + the registers now have meaningful name (instance variable names), and it is always + clear where it came from. + +%h3 Liveliness +%p + the next big step was to determine liveliness of registers. This is something i have not + found good literature on, and in documents about Register Allocation it is often + taken as the starting point. +%p + Basic reasoning lead me to believe that a simple backward scan is at least a safe + estimate. If you imagine going trough the list of instructions and marking the + first occurence of ay register (name) use. By going backwards through the list + you thus get the last useage, and that is the point where we can recycle that + register. +%p + I spent some time trying to figure ot if backward branches changes the fact that + you can release the register, but could not come to a conclusion (brain melt + every time i tried). Intuitively i think that you can, because on the first + run through such a loop you could not use results from a register that because of + the ssa would have had to be created later, but there you go. Even rereading that + hurts. My final argument was that a backward jump is a while loop, and a ruby + while loop would have to store it's data in ruby variables and not new registers + or so i hope). +%p + I did read about phi nodes in ssa's and i did not implement that. Phi nodes are a way to + ensure that different brnaches of an if produce the same registers, or the same + registers are meaningfully filled after the merge of an if. My hope is + that the ruby variable argument gets us out of that, and for some risc function + i added some transfers to ensure things work as they should. + +%h3 Register Allocation +%p + The actual + =ext_link "Register Allocation", "https://en.wikipedia.org/wiki/Register_allocation" + is not substantially more complicated now, than before. But there is a good + base now to make more analysis and optimisations. +%p + + So basically we go through the instruction sequence and assign registers in + order. But because of the liveliness analysis, we can release registers after their + last use, and reuse them immediately off course. I noticed that this results in + surprisingly many registers being used only for a single instruction. + And the total number of registers used went **down by half**. + +%h2 The future +%p + As i said that this was mostly for the future, what is the future going to hold? +%p + Well, **inlining** is number one, that's for sure. +%p + But also there is something called escape analysis. This basically means reclaming + objects that get created in a method, but never passed out. A sort of immediate GC, + thus not only saving on GC time, but also on allocation. Mostly though it would + allow to run larger benchmarks, because there is no GC and integers get created + a lot before a meaningfull number of milliseconds has elapsed. +%p + On the register front, some low hanging fruits are redundant transfer elimination and + double load elimination. Since methods have not grown to exhaust registers, + unloading register is undone and thus is looming. Which brings with it code cost + analysis. +%p + So much more fun to be had!!