renamed

2014-08-22 17:27:57 +03:00
parent f735d6cbc9
commit b100956909
9 changed files with 1 additions and 1 deletions
--- a/lib/register/README.markdown
+++ b/lib/register/README.markdown
@ -0,0 +1,36 @@
+Register Machine 
+===============
+
+This is the logic that uses the generated ast to produce code, using the asm layer.
+
+Apart from shuffeling things around from one layer to the other, it keeps track about registers and
+provides the stack glue. All the stuff a compiler would usually do.
+
+Also all syscalls are abstracted as functions.
+
+The Salama Convention 
+----------------------
+
+Since we're not in c, we use the regsters more suitably for our job:
+
+- return register is _not_ the same as passing registers
+- we pin one more register (ala stack/fp) for type information (this is used for returns too)
+- one line (8 registers) can be used by a function (caller saved?)
+- rest are scratch and may not hold values during call
+
+For Arm this works out as:
+- 0 type word (for the line)
+- 1-6 argument passing + workspace
+- 7 return value
+
+This means syscalls (using 7 for call number and 0 for return) must shuffle a little, but there's space to do it.
+Some more detail:
+
+1 - returning in the same register as passing makes that one register a special case, which i want to avoid. shuffling it gets tricky and involves 2 moves for what?
+As i see it the benefitd of reusing the same register are one more argument register (not needed) and easy chaining of calls, which doen't really happen so much.
+On the plus side, not using the same register makes saving and restoring registers easy (to implement and understand!). 
+An easy to understand policy is worth gold, as register mistakes are HARD to debug and not what i want to spend my time with just now. So that's settled.
+
+2 - Tagging integers like MRI/BB is a hack which does not extend to other types, such as floats. So we don't use that and instead carry type information externally to the value. This is a burden off course, but then so is tagging. 
+The convention (to make it easier) is to handle data in lines (8 words) and have one of them carry the type info for the other 7. This is also the object layout and so we reuse that code on the stack.
+
--- a/lib/register/code.rb
+++ b/lib/register/code.rb
@ -0,0 +1,48 @@
+module Vm
+  # Base class for anything that we can assemble
+
+  # Derived classes include instructions and constants(data)
+  
+  # The commonality abstracted here is the length and position
+  # and the ability to assemble itself into the stream(io)
+  
+  # All code is position independant once assembled.
+  # But for jumps and calls two passes are neccessary. 
+  # The first setting the position, the second assembling
+  class Code
+    
+    def class_for clazz
+      RegisterMachine.instance.class_for(clazz)
+    end
+
+    # set the position to zero, will have to reset later
+    def initialize
+      @position = 0
+    end
+
+    # the position in the stream. Think of it as an address if you want. The difference is small.
+    # Especially since we produce _only_ position independant code
+    # in other words, during assembly the position _must_ be resolved into a pc relative address
+    # and not used as is
+    def position
+      @position 
+    end
+    
+    # The containing class (assembler/function) call this to tell the instruction/data where it is in the
+    # stream. During assembly the position is then used to calculate pc relative addresses.
+    def link_at address , context
+      @position = address
+    end
+    
+    # length for this code in bytes
+    def length
+      raise "Not implemented #{inspect}"
+    end
+    
+    # we pass the io (usually string_io) in for the code to assemble itself.
+    def assemble(io)
+      raise "Not implemented #{self.inspect}"
+    end
+    
+  end
+end
--- a/lib/register/instruction.rb
+++ b/lib/register/instruction.rb
@ -0,0 +1,206 @@
+require_relative "code"
+module Vm
+
+  # Because the idea of what one instruction does, does not always map one to one to real machine
+  # instructions, and instruction may link to another instruction thus creating an arbitrary list
+  # to get the job (the original instruciton) done
+  
+  # Admittately it would be simpler just to create the (abstract) instructions and let the machine 
+  # encode them into what-ever is neccessary, but this approach leaves more possibility to 
+  # optimize the actual instruction stream (not just the salama instruction stream). Makes sense?
+  
+  # We have basic classes (literally) of instructions
+  # - Memory
+  # - Stack
+  # - Logic
+  # - Math
+  # - Control/Compare
+  # - Move
+  # - Call
+  
+  # Instruction derives from Code, for the assembly api
+  
+  class Instruction < Code    
+    def initialize  options
+      @attributes = options
+    end
+    attr_reader :attributes
+    def opcode
+      @attributes[:opcode]
+    end
+    #abstract, only should be called from derived
+    def to_s
+      atts = @attributes.dup
+      atts.delete(:opcode)
+      atts.delete(:update_status)
+      atts.delete(:condition_code) if atts[:condition_code] == :al
+      atts.empty? ? "" : ", #{atts}"
+    end
+    # returns an array of registers (RegisterReferences) that this instruction uses.
+    # ie for r1 = r2 + r3 
+    # which in assembler is add r1 , r2 , r3
+    # it would return [r2,r3]
+    # for pushes the list may be longer, whereas for a jump empty
+    def uses
+      raise "abstract called for #{self.class}"
+    end
+    # returns an array of registers (RegisterReferences) that this instruction assigns to.
+    # ie for r1 = r2 + r3 
+    # which in assembler is add r1 , r2 , r3
+    # it would return [r1]
+    # for most instruction this is one, but comparisons and jumps 0 , and pop's as long as 16
+    def assigns
+      raise "abstract called for #{self.class}"
+    end
+    def method_missing name , *args , &block 
+      return super unless (args.length <= 1) or block_given?
+      set , attribute = name.to_s.split("set_")
+      if set == ""
+        @attributes[attribute.to_sym] = args[0] || 1
+        return self 
+      else
+        return super
+      end
+      return @attributes[name.to_sym]
+    end
+  end
+  
+  class StackInstruction < Instruction
+    def initialize first , options = {}
+      @first = first
+      super(options)
+    end
+    # when calling we place a dummy push/pop in the stream and calculate later what registers actually need saving 
+    def set_registers regs
+      @first = regs.collect{ |r| r.symbol }
+    end
+    def is_push?
+      opcode == :push
+    end
+    def is_pop?
+      !is_push?
+    end
+    def uses
+      is_push? ? regs : []
+    end
+    def assigns
+      is_pop? ? regs : []
+    end
+    def regs
+      @first
+    end
+    def to_s
+      "#{opcode} [#{@first.collect {|f| f.to_asm}.join(',') }] #{super}"
+    end
+  end
+  class MemoryInstruction < Instruction
+    def initialize result , left , right = nil , options = {}
+      @result = result
+      @left = left
+      @right = right
+      super(options)
+    end
+    def uses
+      ret = [@left.register ]
+      ret << @right.register unless @right.nil?
+      ret
+    end
+    def assigns
+      [@result.register]
+    end
+  end
+  class LogicInstruction < Instruction
+    #  result = left op right
+    # 
+    # Logic instruction are your basic operator implementation. But unlike the (normal) code we write
+    #    these Instructions must have "place" to write their results. Ie when you write 4 + 5 in ruby
+    #    the result is sort of up in the air, but with Instructions the result must be assigned 
+    def initialize result , left , right , options = {}
+      @result = result
+      @left = left
+      @right = right.is_a?(Fixnum) ? IntegerConstant.new(right) : right
+      super(options)
+    end
+    attr_accessor :result , :left ,  :right
+    def uses
+      ret = []
+      ret << @left.register if @left and not @left.is_a? Constant
+      ret << @right.register if @right and not @right.is_a?(Constant)
+      ret
+    end
+    def assigns
+      [@result.register]
+    end
+    def to_s
+      "#{opcode} #{result.to_asm} , #{left.to_asm} , #{right.to_asm} #{super}"
+    end
+  end
+  class CompareInstruction < Instruction
+    def initialize left , right , options  = {}
+      @left = left
+      @right = right.is_a?(Fixnum) ? IntegerConstant.new(right) : right
+      super(options)
+    end
+    def uses
+      ret = [@left.register ]
+      ret << @right.register unless @right.is_a? Constant
+      ret
+    end
+    def assigns
+      []
+    end
+    def to_s
+      "#{opcode} #{@left.to_asm} , #{@right.to_asm} #{super}"
+    end
+  end
+  class MoveInstruction < Instruction
+    def initialize to , from , options = {}
+      @to = to
+      @from = from.is_a?(Fixnum) ? IntegerConstant.new(from) : from
+      raise "move must have from set #{inspect}" unless from
+      super(options)
+    end
+    attr_accessor :to , :from
+    def uses
+      @from.is_a?(Constant) ? [] : [@from.register]
+    end
+    def assigns
+      [@to.register]
+    end
+    def to_s
+      "#{opcode} #{@to.to_asm} , #{@from.to_asm} #{super}"
+    end
+  end
+  class CallInstruction < Instruction
+    def initialize first , options  = {}
+      @first = first
+      super(options)
+      opcode = @attributes[:opcode].to_s
+      if opcode.length == 3 and opcode[0] == "b"
+        @attributes[:condition_code] = opcode[1,2].to_sym
+        @attributes[:opcode] = :b
+      end
+      if opcode.length == 6 and opcode[0] == "c"
+        @attributes[:condition_code] = opcode[4,2].to_sym
+        @attributes[:opcode] = :call
+      end
+    end
+    def uses
+      if opcode == :call
+        @first.args.collect {|arg| arg.register }
+      else
+        []
+      end
+    end
+    def assigns
+      if opcode == :call
+        [RegisterReference.new(RegisterMachine.instance.return_register)]
+      else
+        []
+      end
+    end
+    def to_s
+      "#{opcode} #{@first.to_asm} #{super}"
+    end
+  end
+end
--- a/lib/register/integer.rb
+++ b/lib/register/integer.rb
@ -0,0 +1,51 @@
+module Vm
+  class Integer < Word
+    # needs to be here as Word's constructor is private (to make it abstract)
+    def initialize reg
+      super
+    end
+
+    def less_or_equal block , right
+      block.cmp( self ,  right )
+      Vm::BranchCondition.new :le
+    end
+    def greater_or_equal block , right
+      block.cmp( self ,  right )
+      Vm::BranchCondition.new :ge
+    end
+    def greater_than block , right
+      block.cmp( self ,  right )
+      Vm::BranchCondition.new :gt
+    end
+    def less_than block , right
+      block.cmp( self ,  right )
+      Vm::BranchCondition.new :lt
+    end
+    def plus block , first , right
+      block.add( self , left ,  right )
+      self
+    end
+    def minus block , left , right
+      block.sub( self ,  left ,  right )
+      self
+    end
+    def left_shift block , left , right
+      block.mov( self ,  left , shift_lsr: right )
+      self
+    end
+    def equals block , right
+      block.cmp( self ,  right )
+      Vm::BranchCondition.new :eq
+    end
+
+    def is_true? function
+      function.cmp( self ,  0 )
+      Vm::BranchCondition.new :ne
+    end
+
+    def move block , right
+      block.mov(  self ,  right )
+      self
+    end
+  end
+end
--- a/lib/register/mystery.rb
+++ b/lib/register/mystery.rb
@ -0,0 +1,8 @@
+module Vm
+  class Mystery < Word
+    # needs to be here as Word's constructor is private (to make it abstract)
+    def initilize reg
+      super
+    end
+  end
+end
--- a/lib/register/passes.rb
+++ b/lib/register/passes.rb
@ -0,0 +1,108 @@
+module Vm
+  # Passes, or BlockPasses, could have been procs that just get each block passed.
+  # Instead they are proper objects in case they want to save state.
+  # The idea is 
+  # - reduce noise in the main code by having this code seperately (aspect/concern style)
+  # - abstract the iteration
+  # - allow not yet written code to hook in
+  
+  class RemoveStubs
+    def run block
+      block.codes.dup.each_with_index do |kode , index|
+        next unless kode.is_a? StackInstruction
+        if kode.registers.empty?
+          block.codes.delete(kode) 
+          puts "deleted stack instruction in #{b.name}"
+        end
+      end
+    end
+  end
+
+  # Operators eg a + b , must assign their result somewhere and as such create temporary variables.
+  # but if code is c = a + b , the generated instructions would be more like tmp = a + b ; c = tmp
+  # SO if there is an move instruction just after a logic instruction where the result of the logic
+  # instruction is moved straight away, we can undo that mess and remove one instruction.
+  class LogicMoveReduction
+    def run block
+      org = block.codes.dup
+      org.each_with_index do |kode , index|
+        n = org[index+1]
+        next if n.nil?
+        next unless kode.is_a? LogicInstruction
+        next unless n.is_a? MoveInstruction
+        # specific arm instructions, don't optimize as don't know what the extra mean
+        # small todo. This does not catch condition_code that are not :al
+        next if (n.attributes.length > 3) or (kode.attributes.length > 3)
+        if kode.result == n.from
+          puts "Logic reduction #{kode} removes #{n}"
+          kode.result = n.to
+          block.codes.delete(n)
+        end
+      end
+    end
+  end
+
+  # Sometimes there are double moves ie mov a, b and mov b , c . We reduce that to move a , c 
+  # (but don't check if that improves register allocation. Yet ?) 
+  class MoveMoveReduction
+    def run block
+      org = block.codes.dup
+      org.each_with_index do |kode , index|
+        n = org[index+1]
+        next if n.nil?
+        next unless kode.is_a? MoveInstruction
+        next unless n.is_a? MoveInstruction
+        # specific arm instructions, don't optimize as don't know what the extra mean
+        # small todo. This does not catch condition_code that are not :al
+        next if (n.attributes.length > 3) or (kode.attributes.length > 3)
+        if kode.to == n.from
+          puts "Move reduction #{kode}: removes #{n} "
+          kode.to = n.to
+          block.codes.delete(n)
+        end
+      end
+    end
+  end
+
+  #As the name says, remove no-ops. Currently mov x , x supported
+  class NoopReduction
+    def run block
+      block.codes.dup.each_with_index do |kode , index|
+        next unless kode.is_a? MoveInstruction
+        # specific arm instructions, don't optimize as don't know what the extra mean
+        # small todo. This does not catch condition_code that are not :al
+        next if (kode.attributes.length > 3)
+        if kode.to == kode.from
+          block.codes.delete(kode) 
+          puts "deleted noop move in #{block.name} #{kode}"
+        end
+      end
+    end
+  end
+
+  # We insert push/pops as dummies to fill them later in CallSaving
+  # as we can not know ahead of time which locals wil be live in the code to come
+  # and also we don't want to "guess" later where the push/pops should be
+  
+  # Here we check which registers need saving and add them
+  # Or sometimes just remove the push/pops, when no locals needed saving
+  class SaveLocals
+    def run block
+      push = block.call_block?
+      return unless push
+      return unless block.function
+      locals = block.function.locals_at block
+      pop = block.next.codes.first
+      if(locals.empty?)
+        #puts "Empty #{block.name}"
+        block.codes.delete(push)
+        block.next.codes.delete(pop)
+      else
+        #puts "PUSH #{push}"
+        push.set_registers(locals)
+        #puts "POP #{pop}"
+        pop.set_registers(locals)
+      end
+    end
+  end
+end
--- a/lib/register/reference.rb
+++ b/lib/register/reference.rb
@ -0,0 +1,30 @@
+module Vm
+  class Reference < Word
+    # needs to be here as Word's constructor is private (to make it abstract)
+    def initialize reg , clazz = nil
+      super(reg)
+      @clazz = clazz
+    end
+    attr_accessor :clazz
+
+    def load block , right
+      if(right.is_a? IntegerConstant)
+        block.mov(  self ,  right )  #move the value
+      elsif right.is_a? StringConstant
+        block.add( self , right , nil)   #move the address, by "adding" to pc, ie pc relative
+        block.mov( Integer.new(self.register.next_reg_use) ,  right.length )  #and the length HACK TODO
+      elsif right.is_a?(Boot::BootClass) or right.is_a?(Boot::MetaClass)
+        block.add( self , right , nil)   #move the address, by "adding" to pc, ie pc relative
+      else
+        raise "unknown #{right.inspect}" 
+      end
+      self
+    end
+
+    def at_index block , left , right
+      block.ldr( self , left , right )
+      self
+    end
+
+  end
+end
--- a/lib/register/register_machine.rb
+++ b/lib/register/register_machine.rb
@ -0,0 +1,143 @@
+module Vm
+
+  # Our virtual c-machine has a number of registers of a given size and uses a stack
+  # So much so standard
+  # But our machine is oo, meaning that the register contents is typed. 
+  # Off course current hardware does not have that (a perceived issue), but for our machine we pretend.
+  # So internally we have at least 8 word registers, one of which is used to keep track of types*
+  # and any number of scratch registers
+  # but externally it's all Values (see there)
+  
+  # * Note that register content is typed externally. Not as in mri, where int's are tagged. Floats can's
+  #   be tagged and lambda should be it's own type, so tagging does not work
+  
+  # A Machines main responsibility in the framework is to instantiate Instruction
+
+  # Value functions are mapped to machines by concatenating the values class name + the methd name
+  # Example:  IntegerValue.plus( value ) ->  Machine.signed_plus (value )
+  
+  # Also, shortcuts are created to easily instantiate Instruction objects. The "standard" set of instructions
+  # (arm-influenced) provides for normal operations on a register machine, 
+  # Example:  pop -> StackInstruction.new( {:opcode => :pop}.merge(options) )
+  # Instructions work with options, so you can pass anything in, and the only thing the functions does
+  # is save you typing the clazz.new. It passes the function name as the :opcode
+   
+  class RegisterMachine
+  
+    # hmm, not pretty but for now
+    @@instance = nil
+    
+    attr_reader :registers
+    attr_reader :scratch
+    attr_reader :pc
+    attr_reader :stack
+    # is often a pseudo register (ie doesn't support move or other operations).
+    # Still, using if to express tests makes sense, not just for 
+    # consistency in this code, but also because that is what is actually done
+    attr_reader :status  
+
+    # conditions specify all the possibilities for branches. Branches are b +  condition
+    # Example:  beq means brach if equal. 
+    # :al means always, so bal is an unconditional branch (but b() also works)
+    CONDITIONS = [ :al , :eq , :ne , :lt , :le, :ge, :gt , :cs , :mi , :hi , :cc , :pl, :ls , :vc , :vs ]
+    
+    # here we create the shortcuts for the "standard" instructions, see above
+    # Derived machines may use own instructions and define functions for them if so desired
+    def initialize
+      [:push, :pop].each do |inst|
+        define_instruction_one(inst , StackInstruction)
+      end
+      [:adc, :add, :and, :bic, :eor, :orr, :rsb, :rsc, :sbc, :sub].each do |inst|
+        define_instruction_three(inst , LogicInstruction)
+      end
+      [:mov, :mvn].each do |inst|
+        define_instruction_two(inst , MoveInstruction)
+      end
+      [:cmn, :cmp, :teq, :tst].each do |inst|
+        define_instruction_two(inst , CompareInstruction)
+      end
+      [:strb, :str , :ldrb, :ldr].each do |inst|
+        define_instruction_three(inst , MemoryInstruction)
+      end
+      [:b, :call , :swi].each do |inst|
+        define_instruction_one(inst , CallInstruction)
+      end
+      # create all possible brach instructions, but the CallInstruction demangles the 
+      # code, and has opcode set to :b and :condition_code set to the condition
+      CONDITIONS.each do |suffix|
+        define_instruction_one("b#{suffix}".to_sym , CallInstruction)
+        define_instruction_one("call#{suffix}".to_sym , CallInstruction)
+      end
+    end
+
+    def create_method(name,  &block)
+        self.class.send(:define_method, name , &block)
+    end
+
+
+    def self.instance
+      @@instance
+    end
+    def self.instance= machine
+      @@instance = machine
+    end
+    def class_for clazz
+      c_name = clazz.name
+      my_module = self.class.name.split("::").first
+      clazz_name = clazz.name.split("::").last
+      if(my_module != Vm )
+        module_class = eval("#{my_module}::#{clazz_name}") rescue nil
+        clazz = module_class if module_class
+      end
+      clazz
+    end
+    
+    private
+    #defining the instruction (opcode, symbol) as an given class.
+    # the class is a Vm::Instruction derived base class and to create machine specific function
+    #  an actual machine must create derived classes (from this base class) 
+    # These instruction classes must follow a naming pattern and take a hash in the contructor
+    #  Example, a mov() opcode  instantiates a Vm::MoveInstruction
+    #   for an Arm machine, a class Arm::MoveInstruction < Vm::MoveInstruction exists, and it will
+    #    be used to define the mov on an arm machine. 
+    # This methods picks up that derived class and calls a define_instruction methods that can 
+    #   be overriden in subclasses 
+    def define_instruction_one(inst , clazz ,  defaults = {} )
+      clazz =  self.class_for(clazz)
+      create_method(inst) do |first , options = nil|
+        options = {} if options == nil
+        options.merge defaults
+        options[:opcode] = inst
+        first = Vm::Integer.new(first) if first.is_a? Symbol
+        clazz.new(first , options)
+      end
+    end
+
+    # same for two args (left right, from to etc)
+    def define_instruction_two(inst , clazz ,  defaults = {} )
+      clazz =  self.class_for(clazz)
+      create_method(inst) do |left ,right , options = nil|
+        options = {} if options == nil
+        options.merge defaults
+        left = Vm::Integer.new(left) if left.is_a? Symbol
+        right = Vm::Integer.new(right) if right.is_a? Symbol
+        options[:opcode] = inst
+        clazz.new(left , right ,options)
+      end
+    end
+
+    # same for three args (result = left right,)
+    def define_instruction_three(inst , clazz ,  defaults = {} )
+      clazz =  self.class_for(clazz)
+      create_method(inst) do |result , left ,right = nil , options = nil|
+        options = {} if options == nil
+        options.merge defaults
+        options[:opcode] = inst
+        result = Vm::Integer.new(result) if result.is_a? Symbol
+        left = Vm::Integer.new(left) if left.is_a? Symbol
+        right = Vm::Integer.new(right) if right.is_a? Symbol
+        clazz.new(result, left , right ,options)
+      end
+    end
+  end
+end
--- a/lib/register/register_reference.rb
+++ b/lib/register/register_reference.rb
@ -0,0 +1,33 @@
+module Vm
+
+  # RegisterReference is not the name for a register, "only" for a certain use of it. 
+  # In a way it is like a variable name, a storage location. The location is a register off course, 
+  # but which register can be changed, and _all_ instructions sharing the RegisterReference then use that register
+  # In other words a simple level of indirection, or change from value to reference sematics.
+
+  class RegisterReference
+    attr_accessor :symbol
+    def initialize r
+      if( r.is_a? Fixnum)
+        r = "r#{r}".to_sym
+      end
+      raise "wrong type for register init #{r}" unless r.is_a? Symbol
+      raise "double r #{r}" if r == :rr1
+      @symbol = r
+    end
+
+    def == other
+      return false if other.nil?
+      return false if other.class != RegisterReference
+      symbol == other.symbol
+    end
+
+    #helper method to calculate with register symbols
+    def next_reg_use by = 1
+      int = @symbol[1,3].to_i
+      sym = "r#{int + by}".to_sym
+      RegisterReference.new( sym )
+    end
+  end
+
+end