renaming, making space for extra layer

2014-06-25 02:33:44 +03:00
parent 2044a3e994
commit 9b39a3a816
28 changed files with 34 additions and 1 deletions
--- a/lib/neumann/README.markdown
+++ b/lib/neumann/README.markdown
@@ -0,0 +1,36 @@
+Von Neumann Machine 
+===============
+
+This is the logic that uses the generated ast to produce code, using the asm layer.
+
+Apart from shuffeling things around from one layer to the other, it keeps track about registers and
+provides the stack glue. All the stuff a compiler would usually do.
+
+Also all syscalls are abstracted as functions.
+
+The Crystal Convention 
+----------------------
+
+Since we're not in c, we use the regsters more suitably for our job:
+
+- return register is _not_ the same as passing registers
+- we pin one more register (ala stack/fp) for type information (this is used for returns too)
+- one line (8 registers) can be used by a function (caller saved?)
+- rest are scratch and may not hold values during call
+
+For Arm this works out as:
+- 0 type word (for the line)
+- 1-6 argument passing + workspace
+- 7 return value
+
+This means syscalls (using 7 for call number and 0 for return) must shuffle a little, but there's space to do it.
+Some more detail:
+
+1 - returning in the same register as passing makes that one register a special case, which i want to avoid. shuffling it gets tricky and involves 2 moves for what?
+As i see it the benefitd of reusing the same register are one more argument register (not needed) and easy chaining of calls, which doen't really happen so much.
+On the plus side, not using the same register makes saving and restoring registers easy (to implement and understand!). 
+An easy to understand policy is worth gold, as register mistakes are HARD to debug and not what i want to spend my time with just now. So that's settled.
+
+2 - Tagging integers like MRI/BB is a hack which does not extend to other types, such as floats. So we don't use that and instead carry type information externally to the value. This is a burden off course, but then so is tagging. 
+The convention (to make it easier) is to handle data in lines (8 words) and have one of them carry the type info for the other 7. This is also the object layout and so we reuse that code on the stack.
+
--- a/lib/neumann/block.rb
+++ b/lib/neumann/block.rb
@@ -0,0 +1,110 @@
+require_relative "values"
+
+module Vm
+  
+  # Think flowcharts: blocks are the boxes. The smallest unit of linear code
+  
+  # Blocks must end in control instructions (jump/call/return). 
+  # And the only valid argument for a jump is a Block 
+  
+  # Blocks form a linked list
+  
+  # There are four ways for a block to get data (to work on)
+  # - hard coded constants (embedded in code)
+  # - memory move
+  # - values passed in (from previous blocks. ie local variables)
+
+  # See Value description on how to create code/instructions
+  
+  # Codes then get assembled into bytes (after linking)
+  
+  class Block < Code
+
+    def initialize(name , function , next_block )
+      super()
+      @function = function
+      @name = name.to_sym
+      @next = next_block
+      @branch = nil
+      @codes = []
+      # keeping track of register usage, left (assigns) or right (uses)
+      @assigns = []
+      @uses = []
+    end
+
+    attr_reader :name  , :next , :codes , :function , :assigns , :uses
+    attr_accessor :branch
+    
+    def reachable ret = []
+      add_next ret
+      add_branch ret
+      ret
+    end
+
+    def add_code kode
+      kode.assigns.each { |a| (@assigns << a) unless @assigns.include?(a) }
+      kode.uses.each { |use| (@uses << use) unless (@assigns.include?(use) or @uses.include?(use)) }
+      #puts "IN ADD #{name}#{uses}" 
+      @codes << kode
+    end
+
+    def set_next next_b 
+      @next = next_b
+    end
+
+    # returns if this is a block that ends in a call (and thus needs local variable handling)
+    def call_block?
+      return false unless codes.last.is_a?(CallInstruction)
+      return false unless codes.last.opcode == :call
+      codes.dup.reverse.find{ |c| c.is_a? StackInstruction }
+    end
+
+    # Code interface follows. Note position is inheitted as is from Code
+
+    # length of the block is the length of it's codes, plus any next block (ie no branch follower)
+    #  Note, the next is in effect a linked list and as such may have many blocks behind it.
+    def length
+      cods = @codes.inject(0) {| sum  , item | sum + item.length}
+      cods += @next.length if @next
+      cods
+    end
+
+    # to link we link the codes (instructions), plus any next in line block (non- branched)
+    def link_at pos , context
+      super(pos , context)
+      @codes.each do |code|
+        code.link_at(pos , context)
+        pos += code.length
+      end
+      if @next
+        @next.link_at pos , context
+        pos += @next.length
+      end
+      pos
+    end
+
+    # assemble the codes (instructions) and any next in line block
+    def assemble(io)
+      @codes.each do |obj|
+        obj.assemble io
+      end
+      @next.assemble(io) if @next
+    end
+
+    private
+    # helper for determining reachable blocks 
+    def add_next ret
+      return if @next.nil?
+      return if ret.include? @next
+      ret << @next
+      @next.reachable ret
+    end
+    # helper for determining reachable blocks 
+    def add_branch ret
+      return if @branch.nil?
+      return if ret.include? @branch
+      ret << @branch
+      @branch.reachable ret
+    end
+  end
+end
--- a/lib/neumann/call_site.rb
+++ b/lib/neumann/call_site.rb
@@ -0,0 +1,37 @@
+module Vm
+
+  # name and args , return
+
+  class CallSite < Value
+
+    def initialize(name , value , args , function )
+      @name = name
+      @value = value    
+      @args = args
+      @function = function
+      raise "oh #{name} " unless value
+    end
+    attr_reader  :function , :args , :name , :value
+
+    def load_args into
+      if value.is_a?(IntegerConstant) or value.is_a?(ObjectConstant)
+        function.receiver.load into , value
+      else
+        raise "meta #{name} " if value.is_a? Boot::MetaClass
+        function.receiver.move( into, value ) if value.register_symbol != function.receiver.register_symbol
+      end
+      raise "function call '#{args.inspect}' has #{args.length} arguments, but function has #{function.args.length}" if args.length != function.args.length
+      args.each_with_index do |arg , index|
+        if arg.is_a?(IntegerConstant) or arg.is_a?(StringConstant)
+          function.args[index].load into , arg
+        else
+          function.args[index].move( into, arg ) if arg.register_symbol != function.args[index].register_symbol
+        end
+      end
+    end
+
+    def do_call into
+      RegisterMachine.instance.function_call into , self
+    end
+  end
+end
--- a/lib/neumann/code.rb
+++ b/lib/neumann/code.rb
@@ -0,0 +1,48 @@
+module Vm
+  # Base class for anything that we can assemble
+
+  # Derived classes include instructions and constants(data)
+  
+  # The commonality abstracted here is the length and position
+  # and the ability to assemble itself into the stream(io)
+  
+  # All code is position independant once assembled.
+  # But for jumps and calls two passes are neccessary. 
+  # The first setting the position, the second assembling
+  class Code
+    
+    def class_for clazz
+      RegisterMachine.instance.class_for(clazz)
+    end
+
+    # set the position to zero, will have to reset later
+    def initialize
+      @position = 0
+    end
+
+    # the position in the stream. Think of it as an address if you want. The difference is small.
+    # Especially since we produce _only_ position independant code
+    # in other words, during assembly the position _must_ be resolved into a pc relative address
+    # and not used as is
+    def position
+      @position 
+    end
+    
+    # The containing class (assembler/function) call this to tell the instruction/data where it is in the
+    # stream. During assembly the position is then used to calculate pc relative addresses.
+    def link_at address , context
+      @position = address
+    end
+    
+    # length for this code in bytes
+    def length
+      raise "Not implemented #{inspect}"
+    end
+    
+    # we pass the io (usually string_io) in for the code to assemble itself.
+    def assemble(io)
+      raise "Not implemented #{self.inspect}"
+    end
+    
+  end
+end
--- a/lib/neumann/context.rb
+++ b/lib/neumann/context.rb
@@ -0,0 +1,16 @@
+
+module Vm
+  
+  #currently just holding the object_space in here so we can have global access
+  class Context
+    
+    def initialize object_space
+      @object_space = object_space
+      @locals = {}
+    end
+    attr_reader :attributes ,:object_space
+
+    attr_accessor :current_class , :locals , :function
+    
+  end
+end
--- a/lib/neumann/function.rb
+++ b/lib/neumann/function.rb
@@ -0,0 +1,185 @@
+require_relative "block"
+require_relative "passes"
+
+module Vm
+
+  # Functions are similar to Blocks. Where Blocks can be jumped to, Functions can be called.
+
+  # Functions also have arguments and a return. These are Value subclass instances, ie specify
+  #   type (by class type) and register by instance
+
+  # They also have local variables. Args take up the first n regs, then locals the rest. No 
+  #  direct manipulating of registers (ie specifying the number) should be done.
+
+  # Code-wise Functions are made up from a list of Blocks, in a similar way blocks are made up of codes
+  # Four of the block have a special role:
+  # - entry/exit: are usually system specific
+  # - body:  the logical start of the function
+  # - return: the logical end, where ALL blocks must end
+  
+  # Blocks can be linked in two ways:
+  # -linear:  flow continues from one to the next as they are sequential both logically and "physically"
+  #           use the block set_next for this. 
+  #           This "the straight line", there must be a continuous sequence from body to return
+  #           Linear blocks may be created from an existing block with new_block
+  # - branched: You create new blocks using function.new_block which gets added "after" return
+  #            These (eg if/while) blocks may themselves have linear blocks ,but the last of these 
+  #            MUST have an uncoditional branch. And remember, all roads lead to return.
+  
+  class Function < Code
+
+    def initialize(name , receiver = Vm::Reference , args = [] , return_type = Vm::Reference)
+      super()
+      @name = name.to_sym
+      if receiver.is_a?(Value)
+        @receiver = receiver
+        raise "arg in non std register #{receiver.inspect}" unless RegisterMachine.instance.receiver_register == receiver.register_symbol
+      else
+        puts receiver.inspect
+        @receiver = receiver.new(RegisterMachine.instance.receiver_register)
+      end
+      
+      @args = Array.new(args.length)
+      args.each_with_index do |arg , i|
+        shouldda = RegisterReference.new(RegisterMachine.instance.receiver_register).next_reg_use(i + 1)
+        if arg.is_a?(Value)
+          @args[i] = arg
+          raise "arg #{i} in non std register #{arg.register}, expecting #{shouldda}" unless shouldda == arg.register
+        else
+          @args[i] = arg.new(shouldda)
+        end
+      end
+      set_return return_type
+      @exit =  RegisterMachine.instance.function_exit( Vm::Block.new("exit" , self , nil) , name )
+      @return =  Block.new("return", self , @exit)
+      @body =  Block.new("body", self , @return)
+      @insert_at = @body
+      @entry = RegisterMachine.instance.function_entry( Vm::Block.new("entry" , self , @body) ,name )
+      @locals = []
+    end
+
+    attr_reader :args , :entry , :exit , :body , :name , :return_type , :receiver 
+    
+    def insertion_point
+      @insert_at
+    end
+    def set_return type_or_value
+      @return_type = type_or_value || Vm::Reference 
+      if @return_type.is_a?(Value)
+        raise "return in non std register #{@return_type.inspect}" unless RegisterMachine.instance.return_register == @return_type.register_symbol
+      else
+        @return_type = @return_type.new(RegisterMachine.instance.return_register)
+      end
+    end
+    def arity
+      @args.length
+    end
+
+    def new_local type = Vm::Integer
+      register = args.length + 3 + @locals.length # three for the receiver, return and type regs
+      l = type.new(register)     #so start at r3
+      #puts "new local #{l.register_symbol}"
+      raise "Register overflow in function #{name}" if register >= 13 # yep, 13 is bad luck
+      @locals << l
+      l
+    end
+    
+    # return a list of registers that are still in use after the given block
+    # a call_site uses pushes and pops these to make them available for code after a call
+    def locals_at l_block
+      used =[]
+      # call assigns the return register, but as it is in l_block, it is not asked.
+      assigned = [ RegisterReference.new(Vm::RegisterMachine.instance.return_register) ]
+      l_block.reachable.each do |b|
+        b.uses.each {|u|
+          (used << u) unless assigned.include?(u) 
+        }
+        assigned += b.assigns
+      end
+      used.uniq
+    end
+
+    # return a list of the blocks that are addressable, ie entry and @blocks and all next
+    def blocks
+      ret = []
+      b = @entry
+      while b
+        ret << b
+        b = b.next
+      end  
+      ret
+    end
+
+    # when control structures create new blocks (with new_block) control continues at some new block the
+    # the control structure creates. 
+    # Example: while, needs  2 extra blocks
+    #          1 condition code, must be its own blockas we jump back to it
+    #           -       the body, can actually be after the condition as we don't need to jump there
+    #          2 after while block. Condition jumps here 
+    # After block 2, the function is linear again and the calling code does not need to know what happened
+    
+    # But subsequent statements are still using the original block (self) to add code to
+    # So the while expression creates the extra blocks, adds them and the code and then "moves" the insertion point along
+    def insert_at block
+      @insert_at = block
+      self
+    end
+
+    # create a new linear block after the current insertion block. 
+    # Linear means there is no brach needed from that one to the new one. 
+    # Usually the new one just serves as jump address for a control statement
+    # In code generation (assembly) , new new_block is written after this one, ie zero runtime cost
+    # This does _not_ change the insertion point, that has do be done with insert_at(block)
+    def new_block new_name
+      block_name = "#{@insert_at.name}_#{new_name}"
+      new_b = Block.new( block_name , self , @insert_at.next )
+      @insert_at.set_next new_b
+      return new_b
+    end
+
+    def add_code(kode)
+      raise "alarm #{kode}" if kode.is_a? Word
+      raise "alarm #{kode.class} #{kode}" unless kode.is_a? Code
+      @insert_at.add_code kode
+      self
+    end
+
+    # sugar to create instructions easily. 
+    # any method will be passed on to the RegisterMachine and the result added to the insertion block
+    #  With this trick we can write what looks like assembler, 
+    #  Example   func.instance_eval
+    #                mov( r1 , r2 )
+    #                add( r1 , r2 , 4)
+    # end
+    #           mov and add will be called on Machine and generate Inststuction that are then added 
+    #             to the current block
+    # also symbols are supported and wrapped as register usages (for bare metal programming)
+    def method_missing(meth, *args, &block)
+      add_code RegisterMachine.instance.send(meth , *args)
+    end
+
+    # following id the Code interface
+    
+    # to link we link the entry and then any blocks. The entry links the straight line
+    def link_at address , context
+      super #just sets the position
+      @entry.link_at address , context
+    end
+
+    # position of the function is the position of the entry block
+    def position
+      @entry.position
+    end
+
+    # length of a function is the entry block length (includes the straight line behind it) 
+    # plus any out of line blocks that have been added
+    def length
+      @entry.length
+    end
+    
+    # assembling assembles the entry (straight line/ no branch line) + any additional branches
+    def assemble io
+      @entry.assemble(io)
+    end
+  end
+end
--- a/lib/neumann/instruction.rb
+++ b/lib/neumann/instruction.rb
@@ -0,0 +1,206 @@
+require_relative "code"
+module Vm
+
+  # Because the idea of what one instruction does, does not always map one to one to real machine
+  # instructions, and instruction may link to another instruction thus creating an arbitrary list
+  # to get the job (the original instruciton) done
+  
+  # Admittately it would be simpler just to create the (abstract) instructions and let the machine 
+  # encode them into what-ever is neccessary, but this approach leaves more possibility to 
+  # optimize the actual instruction stream (not just the crystal instruction stream). Makes sense?
+  
+  # We have basic classes (literally) of instructions
+  # - Memory
+  # - Stack
+  # - Logic
+  # - Math
+  # - Control/Compare
+  # - Move
+  # - Call
+  
+  # Instruction derives from Code, for the assembly api
+  
+  class Instruction < Code    
+    def initialize  options
+      @attributes = options
+    end
+    attr_reader :attributes
+    def opcode
+      @attributes[:opcode]
+    end
+    #abstract, only should be called from derived
+    def to_s
+      atts = @attributes.dup
+      atts.delete(:opcode)
+      atts.delete(:update_status)
+      atts.delete(:condition_code) if atts[:condition_code] == :al
+      atts.empty? ? "" : ", #{atts}"
+    end
+    # returns an array of registers (RegisterReferences) that this instruction uses.
+    # ie for r1 = r2 + r3 
+    # which in assembler is add r1 , r2 , r3
+    # it would return [r2,r3]
+    # for pushes the list may be longer, whereas for a jump empty
+    def uses
+      raise "abstract called for #{self.class}"
+    end
+    # returns an array of registers (RegisterReferences) that this instruction assigns to.
+    # ie for r1 = r2 + r3 
+    # which in assembler is add r1 , r2 , r3
+    # it would return [r1]
+    # for most instruction this is one, but comparisons and jumps 0 , and pop's as long as 16
+    def assigns
+      raise "abstract called for #{self.class}"
+    end
+    def method_missing name , *args , &block 
+      return super unless (args.length <= 1) or block_given?
+      set , attribute = name.to_s.split("set_")
+      if set == ""
+        @attributes[attribute.to_sym] = args[0] || 1
+        return self 
+      else
+        return super
+      end
+      return @attributes[name.to_sym]
+    end
+  end
+  
+  class StackInstruction < Instruction
+    def initialize first , options = {}
+      @first = first
+      super(options)
+    end
+    # when calling we place a dummy push/pop in the stream and calculate later what registers actually need saving 
+    def set_registers regs
+      @first = regs.collect{ |r| r.symbol }
+    end
+    def is_push?
+      opcode == :push
+    end
+    def is_pop?
+      !is_push?
+    end
+    def uses
+      is_push? ? regs : []
+    end
+    def assigns
+      is_pop? ? regs : []
+    end
+    def regs
+      @first
+    end
+    def to_s
+      "#{opcode} [#{@first.collect {|f| f.to_asm}.join(',') }] #{super}"
+    end
+  end
+  class MemoryInstruction < Instruction
+    def initialize result , left , right = nil , options = {}
+      @result = result
+      @left = left
+      @right = right
+      super(options)
+    end
+    def uses
+      ret = [@left.register ]
+      ret << @right.register unless @right.nil?
+      ret
+    end
+    def assigns
+      [@result.register]
+    end
+  end
+  class LogicInstruction < Instruction
+    #  result = left op right
+    # 
+    # Logic instruction are your basic operator implementation. But unlike the (normal) code we write
+    #    these Instructions must have "place" to write their results. Ie when you write 4 + 5 in ruby
+    #    the result is sort of up in the air, but with Instructions the result must be assigned 
+    def initialize result , left , right , options = {}
+      @result = result
+      @left = left
+      @right = right.is_a?(Fixnum) ? IntegerConstant.new(right) : right
+      super(options)
+    end
+    attr_accessor :result , :left ,  :right
+    def uses
+      ret = []
+      ret << @left.register if @left and not @left.is_a? Constant
+      ret << @right.register if @right and not @right.is_a?(Constant)
+      ret
+    end
+    def assigns
+      [@result.register]
+    end
+    def to_s
+      "#{opcode} #{result.to_asm} , #{left.to_asm} , #{right.to_asm} #{super}"
+    end
+  end
+  class CompareInstruction < Instruction
+    def initialize left , right , options  = {}
+      @left = left
+      @right = right.is_a?(Fixnum) ? IntegerConstant.new(right) : right
+      super(options)
+    end
+    def uses
+      ret = [@left.register ]
+      ret << @right.register unless @right.is_a? Constant
+      ret
+    end
+    def assigns
+      []
+    end
+    def to_s
+      "#{opcode} #{@left.to_asm} , #{@right.to_asm} #{super}"
+    end
+  end
+  class MoveInstruction < Instruction
+    def initialize to , from , options = {}
+      @to = to
+      @from = from.is_a?(Fixnum) ? IntegerConstant.new(from) : from
+      raise "move must have from set #{inspect}" unless from
+      super(options)
+    end
+    attr_accessor :to , :from
+    def uses
+      @from.is_a?(Constant) ? [] : [@from.register]
+    end
+    def assigns
+      [@to.register]
+    end
+    def to_s
+      "#{opcode} #{@to.to_asm} , #{@from.to_asm} #{super}"
+    end
+  end
+  class CallInstruction < Instruction
+    def initialize first , options  = {}
+      @first = first
+      super(options)
+      opcode = @attributes[:opcode].to_s
+      if opcode.length == 3 and opcode[0] == "b"
+        @attributes[:condition_code] = opcode[1,2].to_sym
+        @attributes[:opcode] = :b
+      end
+      if opcode.length == 6 and opcode[0] == "c"
+        @attributes[:condition_code] = opcode[4,2].to_sym
+        @attributes[:opcode] = :call
+      end
+    end
+    def uses
+      if opcode == :call
+        @first.args.collect {|arg| arg.register }
+      else
+        []
+      end
+    end
+    def assigns
+      if opcode == :call
+        [RegisterReference.new(RegisterMachine.instance.return_register)]
+      else
+        []
+      end
+    end
+    def to_s
+      "#{opcode} #{@first.to_asm} #{super}"
+    end
+  end
+end
--- a/lib/neumann/passes.rb
+++ b/lib/neumann/passes.rb
@@ -0,0 +1,108 @@
+module Vm
+  # Passes, or BlockPasses, could have been procs that just get each block passed.
+  # Instead they are proper objects in case they want to save state.
+  # The idea is 
+  # - reduce noise in the main code by having this code seperately (aspect/concern style)
+  # - abstract the iteration
+  # - allow not yet written code to hook in
+  
+  class RemoveStubs
+    def run block
+      block.codes.dup.each_with_index do |kode , index|
+        next unless kode.is_a? StackInstruction
+        if kode.registers.empty?
+          block.codes.delete(kode) 
+          puts "deleted stack instruction in #{b.name}"
+        end
+      end
+    end
+  end
+
+  # Operators eg a + b , must assign their result somewhere and as such create temporary variables.
+  # but if code is c = a + b , the generated instructions would be more like tmp = a + b ; c = tmp
+  # SO if there is an move instruction just after a logic instruction where the result of the logic
+  # instruction is moved straight away, we can undo that mess and remove one instruction.
+  class LogicMoveReduction
+    def run block
+      org = block.codes.dup
+      org.each_with_index do |kode , index|
+        n = org[index+1]
+        next if n.nil?
+        next unless kode.is_a? LogicInstruction
+        next unless n.is_a? MoveInstruction
+        # specific arm instructions, don't optimize as don't know what the extra mean
+        # small todo. This does not catch condition_code that are not :al
+        next if (n.attributes.length > 3) or (kode.attributes.length > 3)
+        if kode.result == n.from
+          puts "Logic reduction #{kode} removes #{n}"
+          kode.result = n.to
+          block.codes.delete(n)
+        end
+      end
+    end
+  end
+
+  # Sometimes there are double moves ie mov a, b and mov b , c . We reduce that to move a , c 
+  # (but don't check if that improves register allocation. Yet ?) 
+  class MoveMoveReduction
+    def run block
+      org = block.codes.dup
+      org.each_with_index do |kode , index|
+        n = org[index+1]
+        next if n.nil?
+        next unless kode.is_a? MoveInstruction
+        next unless n.is_a? MoveInstruction
+        # specific arm instructions, don't optimize as don't know what the extra mean
+        # small todo. This does not catch condition_code that are not :al
+        next if (n.attributes.length > 3) or (kode.attributes.length > 3)
+        if kode.to == n.from
+          puts "Move reduction #{kode}: removes #{n} "
+          kode.to = n.to
+          block.codes.delete(n)
+        end
+      end
+    end
+  end
+
+  #As the name says, remove no-ops. Currently mov x , x supported
+  class NoopReduction
+    def run block
+      block.codes.dup.each_with_index do |kode , index|
+        next unless kode.is_a? MoveInstruction
+        # specific arm instructions, don't optimize as don't know what the extra mean
+        # small todo. This does not catch condition_code that are not :al
+        next if (kode.attributes.length > 3)
+        if kode.to == kode.from
+          block.codes.delete(kode) 
+          puts "deleted noop move in #{block.name} #{kode}"
+        end
+      end
+    end
+  end
+
+  # We insert push/pops as dummies to fill them later in CallSaving
+  # as we can not know ahead of time which locals wil be live in the code to come
+  # and also we don't want to "guess" later where the push/pops should be
+  
+  # Here we check which registers need saving and add them
+  # Or sometimes just remove the push/pops, when no locals needed saving
+  class SaveLocals
+    def run block
+      push = block.call_block?
+      return unless push
+      return unless block.function
+      locals = block.function.locals_at block
+      pop = block.next.codes.first
+      if(locals.empty?)
+        #puts "Empty #{block.name}"
+        block.codes.delete(push)
+        block.next.codes.delete(pop)
+      else
+        #puts "PUSH #{push}"
+        push.set_registers(locals)
+        #puts "POP #{pop}"
+        pop.set_registers(locals)
+      end
+    end
+  end
+end
--- a/lib/neumann/plock.rb
+++ b/lib/neumann/plock.rb
@@ -0,0 +1,64 @@
+module Vm
+  #Plock (Proc-Block) is mostly a Block but also somewhat Proc-ish: A Block that carries data.
+  #
+  # Data in a Block is usefull in the same way data in objects is. Plocks being otherwise just code.
+  #
+  # But the concept is not quite straigtforwrd: If one think of an Plock enbedded in a normal function,
+  # the a data in the Plock would be static data. In OO terms this comes quite close to a Proc, if the data is the local
+  # variables. Quite possibly they shall be used to implement procs, but that is not the direction now.
+  #
+  # For now we use Plocks behaind the scenes as it were. In the code that you never see, method invocation mainly.
+  # 
+  # In terms of implementation the Plock is a Block with data (Not too much data, mainly a couple of references).
+  # The block writes it's instructions as normal, but a jump is inserted as the last instruction. The jump is to the 
+  # next block, over the data that is inserted after the block code (and so before the next)
+  #
+  # It follows that Plocks should be linear blocks.
+  class Plock < Block
+    
+    def initialize(name , function , next_block )
+      super
+      @data = []
+      @branch_code = RegisterMachine.instance.b next_block
+    end
+
+    def set_next next_b
+      super
+      @branch_code = RegisterMachine.instance.b next_block
+    end
+
+    # Data gets assembled after functions
+    def add_data o
+      return if @objects.include? o
+      raise "must be derived from Code #{o.inspect}" unless o.is_a? Vm::Code
+      @data << o # TODO check type , no basic values allowed (must be wrapped)
+    end
+
+    # Code interface follows. Note position is inheitted as is from Code
+
+    # length of the Plock is the length of the block, plus the branch, plus data.
+    def length
+      len = @data.inject(super) {| sum  , item | sum + item.length}
+      len + @branch_code.length
+    end
+
+    # again, super +  branch plus data
+    def link_at pos , context
+      super(pos , context)
+      @branch_code.link_at pos , context
+      @data.each do |code|
+        code.link_at(pos , context)
+        pos += code.length
+      end
+    end
+
+    # again, super +  branch plus data
+    def assemble(io)
+      super
+      @branch_code.assemble(io)
+      @data.each do |obj|
+        obj.assemble io
+      end
+    end
+  end
+end
--- a/lib/neumann/register_machine.rb
+++ b/lib/neumann/register_machine.rb
@@ -0,0 +1,143 @@
+module Vm
+
+  # Our virtual c-machine has a number of registers of a given size and uses a stack
+  # So much so standard
+  # But our machine is oo, meaning that the register contents is typed. 
+  # Off course current hardware does not have that (a perceived issue), but for our machine we pretend.
+  # So internally we have at least 8 word registers, one of which is used to keep track of types*
+  # and any number of scratch registers
+  # but externally it's all Values (see there)
+  
+  # * Note that register content is typed externally. Not as in mri, where int's are tagged. Floats can's
+  #   be tagged and lambda should be it's own type, so tagging does not work
+  
+  # A Machines main responsibility in the framework is to instantiate Instruction
+
+  # Value functions are mapped to machines by concatenating the values class name + the methd name
+  # Example:  IntegerValue.plus( value ) ->  Machine.signed_plus (value )
+  
+  # Also, shortcuts are created to easily instantiate Instruction objects. The "standard" set of instructions
+  # (arm-influenced) provides for normal operations on a register machine, 
+  # Example:  pop -> StackInstruction.new( {:opcode => :pop}.merge(options) )
+  # Instructions work with options, so you can pass anything in, and the only thing the functions does
+  # is save you typing the clazz.new. It passes the function name as the :opcode
+   
+  class RegisterMachine
+  
+    # hmm, not pretty but for now
+    @@instance = nil
+    
+    attr_reader :registers
+    attr_reader :scratch
+    attr_reader :pc
+    attr_reader :stack
+    # is often a pseudo register (ie doesn't support move or other operations).
+    # Still, using if to express tests makes sense, not just for 
+    # consistency in this code, but also because that is what is actually done
+    attr_reader :status  
+
+    # conditions specify all the possibilities for branches. Branches are b +  condition
+    # Example:  beq means brach if equal. 
+    # :al means always, so bal is an unconditional branch (but b() also works)
+    CONDITIONS = [ :al , :eq , :ne , :lt , :le, :ge, :gt , :cs , :mi , :hi , :cc , :pl, :ls , :vc , :vs ]
+    
+    # here we create the shortcuts for the "standard" instructions, see above
+    # Derived machines may use own instructions and define functions for them if so desired
+    def initialize
+      [:push, :pop].each do |inst|
+        define_instruction_one(inst , StackInstruction)
+      end
+      [:adc, :add, :and, :bic, :eor, :orr, :rsb, :rsc, :sbc, :sub].each do |inst|
+        define_instruction_three(inst , LogicInstruction)
+      end
+      [:mov, :mvn].each do |inst|
+        define_instruction_two(inst , MoveInstruction)
+      end
+      [:cmn, :cmp, :teq, :tst].each do |inst|
+        define_instruction_two(inst , CompareInstruction)
+      end
+      [:strb, :str , :ldrb, :ldr].each do |inst|
+        define_instruction_three(inst , MemoryInstruction)
+      end
+      [:b, :call , :swi].each do |inst|
+        define_instruction_one(inst , CallInstruction)
+      end
+      # create all possible brach instructions, but the CallInstruction demangles the 
+      # code, and has opcode set to :b and :condition_code set to the condition
+      CONDITIONS.each do |suffix|
+        define_instruction_one("b#{suffix}".to_sym , CallInstruction)
+        define_instruction_one("call#{suffix}".to_sym , CallInstruction)
+      end
+    end
+
+    def create_method(name,  &block)
+        self.class.send(:define_method, name , &block)
+    end
+
+
+    def self.instance
+      @@instance
+    end
+    def self.instance= machine
+      @@instance = machine
+    end
+    def class_for clazz
+      c_name = clazz.name
+      my_module = self.class.name.split("::").first
+      clazz_name = clazz.name.split("::").last
+      if(my_module != Vm )
+        module_class = eval("#{my_module}::#{clazz_name}") rescue nil
+        clazz = module_class if module_class
+      end
+      clazz
+    end
+    
+    private
+    #defining the instruction (opcode, symbol) as an given class.
+    # the class is a Vm::Instruction derived base class and to create machine specific function
+    #  an actual machine must create derived classes (from this base class) 
+    # These instruction classes must follow a naming pattern and take a hash in the contructor
+    #  Example, a mov() opcode  instantiates a Vm::MoveInstruction
+    #   for an Arm machine, a class Arm::MoveInstruction < Vm::MoveInstruction exists, and it will
+    #    be used to define the mov on an arm machine. 
+    # This methods picks up that derived class and calls a define_instruction methods that can 
+    #   be overriden in subclasses 
+    def define_instruction_one(inst , clazz ,  defaults = {} )
+      clazz =  self.class_for(clazz)
+      create_method(inst) do |first , options = nil|
+        options = {} if options == nil
+        options.merge defaults
+        options[:opcode] = inst
+        first = Vm::Integer.new(first) if first.is_a? Symbol
+        clazz.new(first , options)
+      end
+    end
+
+    # same for two args (left right, from to etc)
+    def define_instruction_two(inst , clazz ,  defaults = {} )
+      clazz =  self.class_for(clazz)
+      create_method(inst) do |left ,right , options = nil|
+        options = {} if options == nil
+        options.merge defaults
+        left = Vm::Integer.new(left) if left.is_a? Symbol
+        right = Vm::Integer.new(right) if right.is_a? Symbol
+        options[:opcode] = inst
+        clazz.new(left , right ,options)
+      end
+    end
+
+    # same for three args (result = left right,)
+    def define_instruction_three(inst , clazz ,  defaults = {} )
+      clazz =  self.class_for(clazz)
+      create_method(inst) do |result , left ,right = nil , options = nil|
+        options = {} if options == nil
+        options.merge defaults
+        options[:opcode] = inst
+        result = Vm::Integer.new(result) if result.is_a? Symbol
+        left = Vm::Integer.new(left) if left.is_a? Symbol
+        right = Vm::Integer.new(right) if right.is_a? Symbol
+        clazz.new(result, left , right ,options)
+      end
+    end
+  end
+end
--- a/lib/neumann/register_reference.rb
+++ b/lib/neumann/register_reference.rb
@@ -0,0 +1,33 @@
+module Vm
+
+  # RegisterReference is not the name for a register, "only" for a certain use of it. 
+  # In a way it is like a variable name, a storage location. The location is a register off course, 
+  # but which register can be changed, and _all_ instructions sharing the RegisterReference then use that register
+  # In other words a simple level of indirection, or change from value to reference sematics.
+
+  class RegisterReference
+    attr_accessor :symbol
+    def initialize r
+      if( r.is_a? Fixnum)
+        r = "r#{r}".to_sym
+      end
+      raise "wrong type for register init #{r}" unless r.is_a? Symbol
+      raise "double r #{r}" if r == :rr1
+      @symbol = r
+    end
+
+    def == other
+      return false if other.nil?
+      return false if other.class != RegisterReference
+      symbol == other.symbol
+    end
+
+    #helper method to calculate with register symbols
+    def next_reg_use by = 1
+      int = @symbol[1,3].to_i
+      sym = "r#{int + by}".to_sym
+      RegisterReference.new( sym )
+    end
+  end
+
+end
--- a/lib/neumann/values.rb
+++ b/lib/neumann/values.rb
@@ -0,0 +1,52 @@
+require_relative "code"
+require_relative "register_reference"
+
+module Vm
+  
+  # Values represent the information as it is processed. Different subclasses for different types, 
+  # each type with different operations.
+  # The oprerations on values is what makes a machine do things. Operations are captured as 
+  # subclasses of Instruction and saved to Blocks
+  
+  # Values are a way to reason about (create/validate) instructions. 
+  
+  # Word Values are what fits in a register. Derived classes
+  # Float, Reference , Integer(s) must fit the same registers
+  
+  # just a base class for data. not sure how this will be usefull (may just have read too much llvm)
+  class Value 
+    def class_for clazz
+      RegisterMachine.instance.class_for(clazz)
+    end
+  end
+
+  # Just a nice way to write branches
+  # Comparisons produce them, and branches take them as argument.
+  class BranchCondition < Value
+
+    def initialize operator
+      @operator = operator
+    end
+    attr_accessor :operator
+    #needed to check the opposite, ie not true
+    def not_operator
+      case @operator
+      when :le
+        :gt
+      when :gt
+        :le
+      when :lt
+        :ge
+      when :eq
+        :ne
+      else
+        raise "no implemented #{@operator}"
+      end
+    end
+  end
+end
+require_relative "values/constants"
+require_relative "values/word"
+require_relative "values/integer"
+require_relative "values/reference"
+require_relative "values/mystery"