ruby-x.github.io/_posts/2014-09-30-a-better-register-machine.md
2014-10-01 17:46:38 +03:00

6.3 KiB

layout author
news Torsten

The register machine abstraction has been somewhat thin, and it is time to change that

Current affairs

When i started, i started from the assembler side, getting arm binaries working and off course learning the arm cpu instruction set in assembler memnonics.

Not having any experience at this level i felt that arm was pretty sensible. Much better than i expected. And so i abtracted the basic instruction classes a little and had the arm instructions implement them pretty much one to one.

Then i tried to implement any ruby logic in that abstraction and failed. Thus was born the virtual machine abstraction of having Message, Frame and Self objects. This in turn mapped nicely to registers with indexed addressing.

Addressing

I just have to sidestep here a little about addressing: the basic problem is off course that we have no idea at compile-time at what address the executable will end up.

The problem first emerged with calling functions. Mostly because that was the only objects i had, and so i was very happy to find out about pc relative addressing, in which you jump or call relative to your current position (program counter). Since the relation is not changed by relocation all is well.

Then came the first strings and the aproach can be extended: instead of grabbing some memory location, ie loading and address and dereferencing, we calculate the address in relation to pc and then dereference. This is great and works fine.

But the smug smile is wiped off the face when one tries to store references. This came with the whole object aproach, the bootspace holding references to all objects in the system. I even devised a plan to always store relative addresses. Not relative to pc, but relative to the self that is storing. This i'm sure would have worked fine too, but it does mean that the running program also has to store those relative addresses (or have different address types, shudder). That was a runtime burden i was not willing to accept.

So there are two choices as far as i see: use elf relocation, or relocate in init code. And yet again i find myself biased to the home-growm aproach. Off course i see that this is partly because i don't want to learn the innards of elf as something very complicated that does a simple thing. But also because it is so simple i am hoping it isn't such a big deal. Most of the code for it, object iteration, type testing, layout decoding, will be useful and neccessary later anyway.

Concise instruction set

So that addressing aside was meant to further the point of a need for a good register instruction set (to write the relocation in). And the code that i have been writing to implement the vm instructions clearly shows a need for a better model at the register model.

On the other hand, the idea of Passes will make it very easy to have a completely sepeate register machine layer. We just transfor the vm to that, and then later from that to arm (or later intel). So there are three things that i am looking for with the new register machine instruction set:

  • easy to understand the model (ie register machine, pc, ..), free of real machine quirks
  • small set of instructions that is needed for our vm
  • better names for instructions

Especially the last one: all the mvn and ldr is getting to me. It's so 50's, as if we didn't have the space to spell out move or load. And even those are not good names, at least i am always wondering what is a move and what a load. And as i explained above in the addressing, if i wanted to load an address of an object into a register with relative addressing, i would actually have to do an add. But when reading an add instruction it is not an intuative conclusion that a load is meant. And since this is a fresh effort i would rather change these things now and make it easier for others to learn sensible stuff than me get used to cryptics only to have everyone after me do the same.

So i will have instructions like RegisterMove, ConstantLoad, Branch, which will translate to mov, ldr and b in arm. I still like to keep the arm level with the traditional names, so people who actually know arm feel right at home. But the extra register layer will make it easier for everyone who has not programmed assembler (and me!), which i am guessing is quite a lot in the ruby community.

In implementation terms it is a relatively small step from the vm layer to the register layer. And an even smaller one to the arm layer. But small steps are good, easy to take, easy to understand, no stumbling.

Extra Benefits

As i am doing this for my own sanity, any additional benefits are really extra, for free as it were. And those extra benefits clearly exist.

Clean interface for cpu specific implementation

That really says it all. That interface was a bit messy, as the RegisterMachine was used in Vm code, but was actually an Arm implementation. So no seperation. Also as mentioned the instruction set was arm heavy, with the quirks even arm has.

So in the future any specific cpu implementation can be quite self sufficient. The classes it uses don't need to derive from anything specific and need only implement the very small code interface (position/length/assemble). And to hook in, all that is needed is to provide a translation from RegisterMachine instructions, which can be done very nicely by providing a Pass for every instruction. So that layer of code is quite seperate from the actual assembler, so it should be easy to reuse existing code (like wilson or metasm).

Reusable optimisations

Clearly the better seperation allows for better optimisations. Concretely Passes can be written to optimize the RegiterMachine's workings. For example register use, constant extraction from loops, or folding of double moves (when a value is moved from reg1 to reg2, and then from reg2 to reg3, and reg2 never being used).

Such optimisations are very general and should then be reusable for specific cpu implementations. They are still usefull at RegiterMachine level mind, as the code is "cleaner" there and it is easier to detect fluff. But the same code may be run after a cpu translation, removing any "fluff" the translation introduced. Thus the translation process may be kept simpler too, as that doesn't need to check for possible optimisations at the same time as translating. Everyone wins :-)