ruby-x.github.io/_posts/2014-09-30-a-better-register-machine.md
2014-09-30 23:17:12 +03:00

4.5 KiB

layout author
news Torsten

The register machine abstraction has been somewhat thin, and it is time to change that

Current affairs

When i started, i started from the assembler side, getting arm binaries working and off course learning the arm cpu instruction set in assembler memnonics.

Not having any experience at this level i felt that arm was pretty sensible. Much better than i expected. And so i abtracted the basic instruction classes a little and had the arm instructions implement them pretty much one to one.

Then i tried to implement any ruby logic in that abstraction and failed. Thus was born the virtual machine abstraction of having Message, Frame and Self objects. This in turn mapped nicely to registers with indexed addressing.

Addressing

I just have to sidestep here a little about addressing: the basic problem is off course that we have no idea at compile-time at what address the executable will end up.

The problem first emerged with calling functions. Mostly because that was the only objects i had, and so i was very happy to find out about pc relative addressing, in which you jump or call relative to your current position (program counter). Since the relation is not changed by relocation all is well.

Then came the first strings and the aproach can be extended: instead of grabbing some memory location, ie loading and address and dereferencing, we calculate the address in relation to pc and then dereference. This is great and works fine.

But the smug smile is wiped off the face when one tries to store references. This came with the whole object aproach, the bootspace holding references to all objects in the system. I even devised a plan to always store relative addresses. Not relative to pc, but relative to the self that is storing. This i'm sure would have worked fine too, but it does mean that the running program also has to store those relative addresses (or have different address types, shudder). That was a runtime burden i was not willing to accept.

So there are two choices as far as i see: use elf relocation, or relocate in init code. And yet again i find myself biased to the home-growm aproach. Off course i see that this is partly because i don't want to learn the innards of elf as something very complicated that does a simple thing. But also because it is so simple i am hoping it isn't such a big deal. Most of the code for it, object iteration, type testing, layout decoding, will be useful and neccessary later anyway.

Concise instruction set

So that addressing aside was meant to further the point of a need for a good register instruction set (to write the relocation in). And the code that i have been writing to implement the vm instructions clearly shows a need for a better model at the register model.

On the other hand, the idea of Passes will make it very easy to have a completely sepeate register machine layer. We just transfor the vm to that, and then later from that to arm (or later intel). So there are three things that i am looking for with the new register machine instruction set:

  • easy to understand the model (ie register machine, pc, ..), free of real machine quirks
  • small set of instructions that is needed for our vm
  • better names for instructions

Especially the last one: all the mvn and ldr is getting to me. It's so 50's, as if we didn't have the space to spell out move or load. And even those are not good names, at least i am always wondering what is a move and what a load. And as i explained above in the addressing, if i wanted to load an address of an object into a register with relative addressing, i would actually have to do an add. But when reading an add instruction it is not an intuative conclusion that a load is meant. And since this is a fresh effort i would rather change these things now and make it easier for others to learn sensible stuff than me get used to cryptics only to have everyone after me do the same.

So i will have instructions like RegisterMove, ConstantLoad, Branch, which will translate to mov, ldr and b in arm. I still like to keep the arm level with the traditional names, so people who actually know arm feel right at home. But the extra register layer will make it easier for everyone who has not programmed assembler (and me!), which i am guessing is quite a lot in the ruby community.

In implementation terms it is a relatively small step from the vm layer to the register layer. And an even smaller one to the arm layer. But small steps are good, easy to take, easy to understand, no stumbling.