This commit is contained in:
Torsten Ruger
2014-08-22 17:27:57 +03:00
parent f735d6cbc9
commit b100956909
9 changed files with 1 additions and 1 deletions

View File

@ -0,0 +1,36 @@
Register Machine
===============
This is the logic that uses the generated ast to produce code, using the asm layer.
Apart from shuffeling things around from one layer to the other, it keeps track about registers and
provides the stack glue. All the stuff a compiler would usually do.
Also all syscalls are abstracted as functions.
The Salama Convention
----------------------
Since we're not in c, we use the regsters more suitably for our job:
- return register is _not_ the same as passing registers
- we pin one more register (ala stack/fp) for type information (this is used for returns too)
- one line (8 registers) can be used by a function (caller saved?)
- rest are scratch and may not hold values during call
For Arm this works out as:
- 0 type word (for the line)
- 1-6 argument passing + workspace
- 7 return value
This means syscalls (using 7 for call number and 0 for return) must shuffle a little, but there's space to do it.
Some more detail:
1 - returning in the same register as passing makes that one register a special case, which i want to avoid. shuffling it gets tricky and involves 2 moves for what?
As i see it the benefitd of reusing the same register are one more argument register (not needed) and easy chaining of calls, which doen't really happen so much.
On the plus side, not using the same register makes saving and restoring registers easy (to implement and understand!).
An easy to understand policy is worth gold, as register mistakes are HARD to debug and not what i want to spend my time with just now. So that's settled.
2 - Tagging integers like MRI/BB is a hack which does not extend to other types, such as floats. So we don't use that and instead carry type information externally to the value. This is a burden off course, but then so is tagging.
The convention (to make it easier) is to handle data in lines (8 words) and have one of them carry the type info for the other 7. This is also the object layout and so we reuse that code on the stack.

48
lib/register/code.rb Normal file
View File

@ -0,0 +1,48 @@
module Vm
# Base class for anything that we can assemble
# Derived classes include instructions and constants(data)
# The commonality abstracted here is the length and position
# and the ability to assemble itself into the stream(io)
# All code is position independant once assembled.
# But for jumps and calls two passes are neccessary.
# The first setting the position, the second assembling
class Code
def class_for clazz
RegisterMachine.instance.class_for(clazz)
end
# set the position to zero, will have to reset later
def initialize
@position = 0
end
# the position in the stream. Think of it as an address if you want. The difference is small.
# Especially since we produce _only_ position independant code
# in other words, during assembly the position _must_ be resolved into a pc relative address
# and not used as is
def position
@position
end
# The containing class (assembler/function) call this to tell the instruction/data where it is in the
# stream. During assembly the position is then used to calculate pc relative addresses.
def link_at address , context
@position = address
end
# length for this code in bytes
def length
raise "Not implemented #{inspect}"
end
# we pass the io (usually string_io) in for the code to assemble itself.
def assemble(io)
raise "Not implemented #{self.inspect}"
end
end
end

206
lib/register/instruction.rb Normal file
View File

@ -0,0 +1,206 @@
require_relative "code"
module Vm
# Because the idea of what one instruction does, does not always map one to one to real machine
# instructions, and instruction may link to another instruction thus creating an arbitrary list
# to get the job (the original instruciton) done
# Admittately it would be simpler just to create the (abstract) instructions and let the machine
# encode them into what-ever is neccessary, but this approach leaves more possibility to
# optimize the actual instruction stream (not just the salama instruction stream). Makes sense?
# We have basic classes (literally) of instructions
# - Memory
# - Stack
# - Logic
# - Math
# - Control/Compare
# - Move
# - Call
# Instruction derives from Code, for the assembly api
class Instruction < Code
def initialize options
@attributes = options
end
attr_reader :attributes
def opcode
@attributes[:opcode]
end
#abstract, only should be called from derived
def to_s
atts = @attributes.dup
atts.delete(:opcode)
atts.delete(:update_status)
atts.delete(:condition_code) if atts[:condition_code] == :al
atts.empty? ? "" : ", #{atts}"
end
# returns an array of registers (RegisterReferences) that this instruction uses.
# ie for r1 = r2 + r3
# which in assembler is add r1 , r2 , r3
# it would return [r2,r3]
# for pushes the list may be longer, whereas for a jump empty
def uses
raise "abstract called for #{self.class}"
end
# returns an array of registers (RegisterReferences) that this instruction assigns to.
# ie for r1 = r2 + r3
# which in assembler is add r1 , r2 , r3
# it would return [r1]
# for most instruction this is one, but comparisons and jumps 0 , and pop's as long as 16
def assigns
raise "abstract called for #{self.class}"
end
def method_missing name , *args , &block
return super unless (args.length <= 1) or block_given?
set , attribute = name.to_s.split("set_")
if set == ""
@attributes[attribute.to_sym] = args[0] || 1
return self
else
return super
end
return @attributes[name.to_sym]
end
end
class StackInstruction < Instruction
def initialize first , options = {}
@first = first
super(options)
end
# when calling we place a dummy push/pop in the stream and calculate later what registers actually need saving
def set_registers regs
@first = regs.collect{ |r| r.symbol }
end
def is_push?
opcode == :push
end
def is_pop?
!is_push?
end
def uses
is_push? ? regs : []
end
def assigns
is_pop? ? regs : []
end
def regs
@first
end
def to_s
"#{opcode} [#{@first.collect {|f| f.to_asm}.join(',') }] #{super}"
end
end
class MemoryInstruction < Instruction
def initialize result , left , right = nil , options = {}
@result = result
@left = left
@right = right
super(options)
end
def uses
ret = [@left.register ]
ret << @right.register unless @right.nil?
ret
end
def assigns
[@result.register]
end
end
class LogicInstruction < Instruction
# result = left op right
#
# Logic instruction are your basic operator implementation. But unlike the (normal) code we write
# these Instructions must have "place" to write their results. Ie when you write 4 + 5 in ruby
# the result is sort of up in the air, but with Instructions the result must be assigned
def initialize result , left , right , options = {}
@result = result
@left = left
@right = right.is_a?(Fixnum) ? IntegerConstant.new(right) : right
super(options)
end
attr_accessor :result , :left , :right
def uses
ret = []
ret << @left.register if @left and not @left.is_a? Constant
ret << @right.register if @right and not @right.is_a?(Constant)
ret
end
def assigns
[@result.register]
end
def to_s
"#{opcode} #{result.to_asm} , #{left.to_asm} , #{right.to_asm} #{super}"
end
end
class CompareInstruction < Instruction
def initialize left , right , options = {}
@left = left
@right = right.is_a?(Fixnum) ? IntegerConstant.new(right) : right
super(options)
end
def uses
ret = [@left.register ]
ret << @right.register unless @right.is_a? Constant
ret
end
def assigns
[]
end
def to_s
"#{opcode} #{@left.to_asm} , #{@right.to_asm} #{super}"
end
end
class MoveInstruction < Instruction
def initialize to , from , options = {}
@to = to
@from = from.is_a?(Fixnum) ? IntegerConstant.new(from) : from
raise "move must have from set #{inspect}" unless from
super(options)
end
attr_accessor :to , :from
def uses
@from.is_a?(Constant) ? [] : [@from.register]
end
def assigns
[@to.register]
end
def to_s
"#{opcode} #{@to.to_asm} , #{@from.to_asm} #{super}"
end
end
class CallInstruction < Instruction
def initialize first , options = {}
@first = first
super(options)
opcode = @attributes[:opcode].to_s
if opcode.length == 3 and opcode[0] == "b"
@attributes[:condition_code] = opcode[1,2].to_sym
@attributes[:opcode] = :b
end
if opcode.length == 6 and opcode[0] == "c"
@attributes[:condition_code] = opcode[4,2].to_sym
@attributes[:opcode] = :call
end
end
def uses
if opcode == :call
@first.args.collect {|arg| arg.register }
else
[]
end
end
def assigns
if opcode == :call
[RegisterReference.new(RegisterMachine.instance.return_register)]
else
[]
end
end
def to_s
"#{opcode} #{@first.to_asm} #{super}"
end
end
end

51
lib/register/integer.rb Normal file
View File

@ -0,0 +1,51 @@
module Vm
class Integer < Word
# needs to be here as Word's constructor is private (to make it abstract)
def initialize reg
super
end
def less_or_equal block , right
block.cmp( self , right )
Vm::BranchCondition.new :le
end
def greater_or_equal block , right
block.cmp( self , right )
Vm::BranchCondition.new :ge
end
def greater_than block , right
block.cmp( self , right )
Vm::BranchCondition.new :gt
end
def less_than block , right
block.cmp( self , right )
Vm::BranchCondition.new :lt
end
def plus block , first , right
block.add( self , left , right )
self
end
def minus block , left , right
block.sub( self , left , right )
self
end
def left_shift block , left , right
block.mov( self , left , shift_lsr: right )
self
end
def equals block , right
block.cmp( self , right )
Vm::BranchCondition.new :eq
end
def is_true? function
function.cmp( self , 0 )
Vm::BranchCondition.new :ne
end
def move block , right
block.mov( self , right )
self
end
end
end

8
lib/register/mystery.rb Normal file
View File

@ -0,0 +1,8 @@
module Vm
class Mystery < Word
# needs to be here as Word's constructor is private (to make it abstract)
def initilize reg
super
end
end
end

108
lib/register/passes.rb Normal file
View File

@ -0,0 +1,108 @@
module Vm
# Passes, or BlockPasses, could have been procs that just get each block passed.
# Instead they are proper objects in case they want to save state.
# The idea is
# - reduce noise in the main code by having this code seperately (aspect/concern style)
# - abstract the iteration
# - allow not yet written code to hook in
class RemoveStubs
def run block
block.codes.dup.each_with_index do |kode , index|
next unless kode.is_a? StackInstruction
if kode.registers.empty?
block.codes.delete(kode)
puts "deleted stack instruction in #{b.name}"
end
end
end
end
# Operators eg a + b , must assign their result somewhere and as such create temporary variables.
# but if code is c = a + b , the generated instructions would be more like tmp = a + b ; c = tmp
# SO if there is an move instruction just after a logic instruction where the result of the logic
# instruction is moved straight away, we can undo that mess and remove one instruction.
class LogicMoveReduction
def run block
org = block.codes.dup
org.each_with_index do |kode , index|
n = org[index+1]
next if n.nil?
next unless kode.is_a? LogicInstruction
next unless n.is_a? MoveInstruction
# specific arm instructions, don't optimize as don't know what the extra mean
# small todo. This does not catch condition_code that are not :al
next if (n.attributes.length > 3) or (kode.attributes.length > 3)
if kode.result == n.from
puts "Logic reduction #{kode} removes #{n}"
kode.result = n.to
block.codes.delete(n)
end
end
end
end
# Sometimes there are double moves ie mov a, b and mov b , c . We reduce that to move a , c
# (but don't check if that improves register allocation. Yet ?)
class MoveMoveReduction
def run block
org = block.codes.dup
org.each_with_index do |kode , index|
n = org[index+1]
next if n.nil?
next unless kode.is_a? MoveInstruction
next unless n.is_a? MoveInstruction
# specific arm instructions, don't optimize as don't know what the extra mean
# small todo. This does not catch condition_code that are not :al
next if (n.attributes.length > 3) or (kode.attributes.length > 3)
if kode.to == n.from
puts "Move reduction #{kode}: removes #{n} "
kode.to = n.to
block.codes.delete(n)
end
end
end
end
#As the name says, remove no-ops. Currently mov x , x supported
class NoopReduction
def run block
block.codes.dup.each_with_index do |kode , index|
next unless kode.is_a? MoveInstruction
# specific arm instructions, don't optimize as don't know what the extra mean
# small todo. This does not catch condition_code that are not :al
next if (kode.attributes.length > 3)
if kode.to == kode.from
block.codes.delete(kode)
puts "deleted noop move in #{block.name} #{kode}"
end
end
end
end
# We insert push/pops as dummies to fill them later in CallSaving
# as we can not know ahead of time which locals wil be live in the code to come
# and also we don't want to "guess" later where the push/pops should be
# Here we check which registers need saving and add them
# Or sometimes just remove the push/pops, when no locals needed saving
class SaveLocals
def run block
push = block.call_block?
return unless push
return unless block.function
locals = block.function.locals_at block
pop = block.next.codes.first
if(locals.empty?)
#puts "Empty #{block.name}"
block.codes.delete(push)
block.next.codes.delete(pop)
else
#puts "PUSH #{push}"
push.set_registers(locals)
#puts "POP #{pop}"
pop.set_registers(locals)
end
end
end
end

30
lib/register/reference.rb Normal file
View File

@ -0,0 +1,30 @@
module Vm
class Reference < Word
# needs to be here as Word's constructor is private (to make it abstract)
def initialize reg , clazz = nil
super(reg)
@clazz = clazz
end
attr_accessor :clazz
def load block , right
if(right.is_a? IntegerConstant)
block.mov( self , right ) #move the value
elsif right.is_a? StringConstant
block.add( self , right , nil) #move the address, by "adding" to pc, ie pc relative
block.mov( Integer.new(self.register.next_reg_use) , right.length ) #and the length HACK TODO
elsif right.is_a?(Boot::BootClass) or right.is_a?(Boot::MetaClass)
block.add( self , right , nil) #move the address, by "adding" to pc, ie pc relative
else
raise "unknown #{right.inspect}"
end
self
end
def at_index block , left , right
block.ldr( self , left , right )
self
end
end
end

View File

@ -0,0 +1,143 @@
module Vm
# Our virtual c-machine has a number of registers of a given size and uses a stack
# So much so standard
# But our machine is oo, meaning that the register contents is typed.
# Off course current hardware does not have that (a perceived issue), but for our machine we pretend.
# So internally we have at least 8 word registers, one of which is used to keep track of types*
# and any number of scratch registers
# but externally it's all Values (see there)
# * Note that register content is typed externally. Not as in mri, where int's are tagged. Floats can's
# be tagged and lambda should be it's own type, so tagging does not work
# A Machines main responsibility in the framework is to instantiate Instruction
# Value functions are mapped to machines by concatenating the values class name + the methd name
# Example: IntegerValue.plus( value ) -> Machine.signed_plus (value )
# Also, shortcuts are created to easily instantiate Instruction objects. The "standard" set of instructions
# (arm-influenced) provides for normal operations on a register machine,
# Example: pop -> StackInstruction.new( {:opcode => :pop}.merge(options) )
# Instructions work with options, so you can pass anything in, and the only thing the functions does
# is save you typing the clazz.new. It passes the function name as the :opcode
class RegisterMachine
# hmm, not pretty but for now
@@instance = nil
attr_reader :registers
attr_reader :scratch
attr_reader :pc
attr_reader :stack
# is often a pseudo register (ie doesn't support move or other operations).
# Still, using if to express tests makes sense, not just for
# consistency in this code, but also because that is what is actually done
attr_reader :status
# conditions specify all the possibilities for branches. Branches are b + condition
# Example: beq means brach if equal.
# :al means always, so bal is an unconditional branch (but b() also works)
CONDITIONS = [ :al , :eq , :ne , :lt , :le, :ge, :gt , :cs , :mi , :hi , :cc , :pl, :ls , :vc , :vs ]
# here we create the shortcuts for the "standard" instructions, see above
# Derived machines may use own instructions and define functions for them if so desired
def initialize
[:push, :pop].each do |inst|
define_instruction_one(inst , StackInstruction)
end
[:adc, :add, :and, :bic, :eor, :orr, :rsb, :rsc, :sbc, :sub].each do |inst|
define_instruction_three(inst , LogicInstruction)
end
[:mov, :mvn].each do |inst|
define_instruction_two(inst , MoveInstruction)
end
[:cmn, :cmp, :teq, :tst].each do |inst|
define_instruction_two(inst , CompareInstruction)
end
[:strb, :str , :ldrb, :ldr].each do |inst|
define_instruction_three(inst , MemoryInstruction)
end
[:b, :call , :swi].each do |inst|
define_instruction_one(inst , CallInstruction)
end
# create all possible brach instructions, but the CallInstruction demangles the
# code, and has opcode set to :b and :condition_code set to the condition
CONDITIONS.each do |suffix|
define_instruction_one("b#{suffix}".to_sym , CallInstruction)
define_instruction_one("call#{suffix}".to_sym , CallInstruction)
end
end
def create_method(name, &block)
self.class.send(:define_method, name , &block)
end
def self.instance
@@instance
end
def self.instance= machine
@@instance = machine
end
def class_for clazz
c_name = clazz.name
my_module = self.class.name.split("::").first
clazz_name = clazz.name.split("::").last
if(my_module != Vm )
module_class = eval("#{my_module}::#{clazz_name}") rescue nil
clazz = module_class if module_class
end
clazz
end
private
#defining the instruction (opcode, symbol) as an given class.
# the class is a Vm::Instruction derived base class and to create machine specific function
# an actual machine must create derived classes (from this base class)
# These instruction classes must follow a naming pattern and take a hash in the contructor
# Example, a mov() opcode instantiates a Vm::MoveInstruction
# for an Arm machine, a class Arm::MoveInstruction < Vm::MoveInstruction exists, and it will
# be used to define the mov on an arm machine.
# This methods picks up that derived class and calls a define_instruction methods that can
# be overriden in subclasses
def define_instruction_one(inst , clazz , defaults = {} )
clazz = self.class_for(clazz)
create_method(inst) do |first , options = nil|
options = {} if options == nil
options.merge defaults
options[:opcode] = inst
first = Vm::Integer.new(first) if first.is_a? Symbol
clazz.new(first , options)
end
end
# same for two args (left right, from to etc)
def define_instruction_two(inst , clazz , defaults = {} )
clazz = self.class_for(clazz)
create_method(inst) do |left ,right , options = nil|
options = {} if options == nil
options.merge defaults
left = Vm::Integer.new(left) if left.is_a? Symbol
right = Vm::Integer.new(right) if right.is_a? Symbol
options[:opcode] = inst
clazz.new(left , right ,options)
end
end
# same for three args (result = left right,)
def define_instruction_three(inst , clazz , defaults = {} )
clazz = self.class_for(clazz)
create_method(inst) do |result , left ,right = nil , options = nil|
options = {} if options == nil
options.merge defaults
options[:opcode] = inst
result = Vm::Integer.new(result) if result.is_a? Symbol
left = Vm::Integer.new(left) if left.is_a? Symbol
right = Vm::Integer.new(right) if right.is_a? Symbol
clazz.new(result, left , right ,options)
end
end
end
end

View File

@ -0,0 +1,33 @@
module Vm
# RegisterReference is not the name for a register, "only" for a certain use of it.
# In a way it is like a variable name, a storage location. The location is a register off course,
# but which register can be changed, and _all_ instructions sharing the RegisterReference then use that register
# In other words a simple level of indirection, or change from value to reference sematics.
class RegisterReference
attr_accessor :symbol
def initialize r
if( r.is_a? Fixnum)
r = "r#{r}".to_sym
end
raise "wrong type for register init #{r}" unless r.is_a? Symbol
raise "double r #{r}" if r == :rr1
@symbol = r
end
def == other
return false if other.nil?
return false if other.class != RegisterReference
symbol == other.symbol
end
#helper method to calculate with register symbols
def next_reg_use by = 1
int = @symbol[1,3].to_i
sym = "r#{int + by}".to_sym
RegisterReference.new( sym )
end
end
end