From 28c0ff30fdf0117d124c9f18b218d92daad612eb Mon Sep 17 00:00:00 2001 From: Torsten Ruger Date: Tue, 30 Sep 2014 23:17:12 +0300 Subject: [PATCH] first version of register machine post --- .../2014-09-30-a-better-register-machine.md | 72 +++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 _posts/2014-09-30-a-better-register-machine.md diff --git a/_posts/2014-09-30-a-better-register-machine.md b/_posts/2014-09-30-a-better-register-machine.md new file mode 100644 index 0000000..b832b29 --- /dev/null +++ b/_posts/2014-09-30-a-better-register-machine.md @@ -0,0 +1,72 @@ +--- +layout: news +author: Torsten +--- + +The register machine abstraction has been somewhat thin, and it is time to change that + +### Current affairs + +When i started, i started from the assembler side, getting arm binaries working and off course learning the arm cpu +instruction set in assembler memnonics. + +Not having **any** experience at this level i felt that arm was pretty sensible. Much better than i expected. And +so i abtracted the basic instruction classes a little and had the arm instructions implement them pretty much one +to one. + +Then i tried to implement any ruby logic in that abstraction and failed. Thus was born the virtual machine +abstraction of having Message, Frame and Self objects. This in turn mapped nicely to registers with indexed +addressing. + +### Addressing + +I just have to sidestep here a little about addressing: the basic problem is off course that we have no idea at +compile-time at what address the executable will end up. + +The problem first emerged with calling functions. Mostly because that was the only objects i had, and so i was +very happy to find out about pc relative addressing, in which you jump or call relative to your current position +(**p**rogram **c**ounter). Since the relation is not changed by relocation all is well. + +Then came the first strings and the aproach can be extended: instead of grabbing some memory location, ie loading +and address and dereferencing, we calculate the address in relation to pc and then dereference. This is great and +works fine. + +But the smug smile is wiped off the face when one tries to store references. This came with the whole object +aproach, the bootspace holding references to **all** objects in the system. I even devised a plan to always store +relative addresses. Not relative to pc, but relative to the self that is storing. This i'm sure would have +worked fine too, but it does mean that the running program also has to store those relative addresses (or have +different address types, shudder). That was a runtime burden i was not willing to accept. + +So there are two choices as far as i see: use elf relocation, or relocate in init code. And yet again i find myself +biased to the home-growm aproach. Off course i see that this is partly because i don't want to learn the innards of +elf as something very complicated that does a simple thing. But also because it is so simple i am hoping it isn't +such a big deal. Most of the code for it, object iteration, type testing, layout decoding, will be useful and +neccessary later anyway. + +### Concise instruction set + +So that addressing aside was meant to further the point of a need for a good register instruction set (to write the +relocation in). And the code that i have been writing to implement the vm instructions clearly shows a need for +a better model at the register model. + +On the other hand, the idea of Passes will make it very easy to have a completely sepeate register machine layer. +We just transfor the vm to that, and then later from that to arm (or later intel). So there are three things that i +am looking for with the new register machine instruction set: + +- easy to understand the model (ie register machine, pc, ..), free of real machine quirks +- small set of instructions that is needed for our vm +- better names for instructions + +Especially the last one: all the mvn and ldr is getting to me. It's so 50's, as if we didn't have the space to spell +out move or load. And even those are not good names, at least i am always wondering what is a move and what a load. +And as i explained above in the addressing, if i wanted to load an address of an object into a register with relative +addressing, i would actually have to do an add. But when reading an add instruction it is not an intuative +conclusion that a load is meant. And since this is a fresh effort i would rather change these things now and make +it easier for others to learn sensible stuff than me get used to cryptics only to have everyone after me do the same. + +So i will have instructions like RegisterMove, ConstantLoad, Branch, which will translate to mov, ldr and b in arm. I still like to keep the arm level with the traditional names, so people who actually know arm feel right at home. +But the extra register layer will make it easier for everyone who has not programmed assembler (and me!), +which i am guessing is quite a lot in the *ruby* community. + +In implementation terms it is a relatively small step from the vm layer to the register layer. And an even smaller +one to the arm layer. But small steps are good, easy to take, easy to understand, no stumbling.