diff --git a/_layouts/site.html b/_layouts/site.html index 348025c..b5f974f 100644 --- a/_layouts/site.html +++ b/_layouts/site.html @@ -38,7 +38,7 @@ Architecture
  • - Machine layer + Typed layer
  • Arm Resources diff --git a/arm/overview.md b/arm/overview.md index 066acc6..3715a28 100644 --- a/arm/overview.md +++ b/arm/overview.md @@ -10,7 +10,7 @@ a collection of helpful resources (links and specs) with sometimes very very bri So why learn assembler, after all, it's likely you spent your programmers life avoiding it: - - Some things can not be expressed in Soml + - Some things can not be expressed in ruby - To speed things up. - To add cpu specific capabilities @@ -31,7 +31,7 @@ And off course there is the overwhelming arm infocenter, [here with it's bizarre The full 750 page specification for the pi , the [ARM1176JZF-S pdf is here](/arm/big_spec.pdf) or [online](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/BABFADHJ.html) -A nice list of [Kernel calls](http://docs.cs.up.ac.za/programming/asm/derick_tut/syscalls.html) +A nice list of [Kernel calls](http://docs.cs.up.ac.za/programming/asm/derick_tut/syscalls.html) ## Virtual pi And since not everyone has access to an arm, here is a description how to set up an [emulated pi](/arm/qemu.html) diff --git a/salama/layers.html b/salama/layers.html deleted file mode 100644 index 445ec03..0000000 --- a/salama/layers.html +++ /dev/null @@ -1,152 +0,0 @@ ---- -layout: salama -title: Salama architectural layers ---- - - -
    -

    Main Layers

    -

    - To implement an object system to execute object oriented languages takes a large system. - The parts or abstraction layers are detailed below.
    - It is important to undrstand the approach first though, as it differs from the normal - interpretation. The idea is to compile (eg) ruby. It may be easiest to compare to a static - object oriented language like c++. When c++ was created c++ code was translated into c, which - then gets translated into assembler, which gets translated to binary code, which is linked - and executed. Compiling to binaries is what gives these languages speed, and is one reason - to compile ruby.
    - In a similar way to the c++ example, we need language between ruby and assembler, as it is too - big a mental step from ruby to assembler. Off course course one could try to compile to c, but - since c is not object oriented that would mean dealing with all off c's non oo heritance, like - linking model, memory model, calling convention etc. (more on this in the book)
    - The layers are: -

    -

    -
    - -
    -
    Binary , Arm and Elf
    -

    - A physical machine will run binaries containing intructions that the cpu understands. With arm - being our main target, this means we need code to produce binary, which is contained in a - seperate module salama-arm .
    - To be able to run code on a unix based operating system, binaries need to be packaged in a - way that the os understands, so minimal elf support is included in the package.
    - Arm is a risc architecture, but anyone who knows it will attest, with it's own quirks. - For example any instruction may be executed conditionally in arm. Or there is no 32bit - register load instruction. It is possible to create very dense code using all the arm - special features, but this is not implemented yet. -

    -
    - -
    -
    Register Machine
    -

    - The Register machine layer is a relatively close abstraction of risc hardware, but without the - quirks. -
    - The register machine has registers, indexed addressing, operators, branches and everything - needed for the next layer. It doesn not try to abstract every possible machine leature - (like llvm), but rather "objectifies" the risc view to provide what is needed for soml, the - next layer up. -
    - The machine has it's own (abstract) instruction set, and the mapping to arm is quite - straightforward. Since the instruction set is implemented as derived classes, additional - instructions may be defined and used later, as long as translation is provided for them too. - In other words the instruction set is extensible (unlike cpu instruction sets). -

    -

    - Basic object oriented concepts are needed already at this level, to be able to generate a whole - self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called - parfait and will be coded in soml eventually. But since it is the same objects at runtime and - compile time, it will then be translated back to ruby for use at compile time. Currenty there - are two versions of the code, in ruby and soml, being hand synchronized. More about parfait below. -

    -

    - Since working with at this low machine level (essentially assembler) is not easy to follow for - everyone, an interpreter was created. Later a graphical interface, a kind of - visual debugger was added. - Visualizing the control flow and being able to see values updated immediately helped - tremendously in creating this layer. And the interpreter helps in testing, ie keeping it - working in the face of developer change. -

    -
    - -
    -
    Soml, Salama object machine language
    -

    - Soml is probably the larest single part of the system and much more information can be found - here . -
    - Before soml, a more traditional virtual machine approach was taken and abandoned. The language - is easy to understand and provides a good abstraction, both in terms of object orienteation, - and in terms of how this is expressed in the register model.
    - It is like ruby with out the dynamic aspects, but typed.
    - In broad strokes it consists off: -

    -

    -

    - Just to summarize a few of soml features that are maybe unusual: -

    -

    -
    - -
    -
    Salama
    -

    - To compile and run ruby, we need to parse and compile ruby code. To compile ruby to soml a clear - mapping has to be achieved. Particularly the dynamic aspects, and typing need to be addressed. -
    - While parsing ruby is quite a difficult task, it has already been implemented in pure ruby - here . The output of the parser is again - an ast, which needs to be compiled to soml.
    - The dynamic aspects of ruby are actually realtively easy to handle, once the whole system is - in place, because the whole system is written in ruby without external dependencies. - Since (when finished) it can compile ruby, it can do so to produce a binary. This binary can - then contain the whole of the system, and so the resulting binary will be able to produce - binary code when it runs. With small changes to the linking process (easy in ruby!) it can - then extend itself. -

    -

    - The type aspect is more tricky: Ruby is not typed and soml is after all. And if everything - were objects (as we like to pretend in ruby) we could just do a lot of dynamic checking, - possibly later introduce some caching. But everything is not an object, minimally integers - are not, but maybe also floats and other values. The destinction between what is an integer - and what an object has sprouted an elaborate type system, which is (by necessity) present in - soml (see there). -

    -

    - The idea (because it hasn't been implemented yet) is to have different functions for different - types. The soml layer defines the Type class and BasicTypes and also lets us return to different - places from a function (in effect a soml function call is like an if). By using this, we can - compile a single ruby method into several soml functtions. Each such function is typed, ie all - arguments and variables are of known type. According to these types we can call functions according - to their signatures. Also we can autognerate error methods for unhandled types, and predict - that only a fraction of the possible combinations will actually be needed. -

    -
    diff --git a/salama/layers.md b/salama/layers.md new file mode 100644 index 0000000..59a6f9f --- /dev/null +++ b/salama/layers.md @@ -0,0 +1,132 @@ +--- +layout: salama +title: Salama architectural layers +--- + +## Main Layers + +To implement an object system to execute object oriented languages takes a large system. +The parts or abstraction layers are detailed below. + +It is important to understand the approach first though, as it differs from the normal +interpretation. The idea is to **compile** ruby. It may be easiest to compare to a static +object oriented language like c++. When c++ was created c++ code was translated into c, which +then gets translated into assembler, which gets translated to binary code, which is linked +and executed. Compiling to binaries is what gives these languages speed, and is the reason +to compile ruby. + +In a similar way to the c++ example, we need level between ruby and assembler, as it is too +big a mental step from ruby to assembler. Off course course one could try to compile to c, but +since c is not object oriented that would mean dealing with all off c's non oo heritance, like +linking model, memory model, calling convention etc. + +Top down the layers are: + +- **Melon** , compiling ruby code into typed layer and includes bootstrapping code + +- **Typed intermediate layer:** Statically typed object oriented with object oriented +call semantics. + +- **Risc register machine abstraction** provides a level of machine abstraction, but + as the name says, quite a simple one. + +- **Binary and cpu specific assembler** This includes arm assembly and elf support + to produce a binary that can then read in ruby programs + +### Melon + +To compile and run ruby, we need to parse and compile ruby code. While parsing ruby is quite +a difficult task, it has already been implemented in pure ruby +[here](https://github.com/whitequark/parser). The output of the parser is again +an ast, which needs to be compiled to the typed layer. + +The dynamic aspects of ruby are actually reltively easy to handle, once the whole system is +in place, because the whole system is written in ruby without external dependencies. +Since (when finished) it can compile ruby, it can do so to produce a binary. This binary can +then contain the whole of the system, and so the resulting binary will be able to produce +binary code when it runs. With small changes to the linking process (easy in ruby!) it can +then extend itself. + +The type aspect is more tricky: Ruby is not typed and but the typed layer is after all. And +if everything were objects (as we like to pretend in ruby) we could just do a lot of +dynamic checking, possibly later introduce some caching. But everything is not an object, +minimally integers are not, but maybe also floats and other values. +The distinction between what is an integer and what an object has sprouted an elaborate +type system, which is (by necessity) present in the typed layer. + + + +### Typed intermediate layer + +The Typed intermediate layer is more fully described [here](/typed/typed.html) + +In broad strokes it consists off: + +- **MethodCompiler:** compiles the ast into a sequence of Register instructions. + and runtime objects (classes, methods etc) +- **Parfait:** Is the runtime, ie the minimal set of objects needed to + create a binary with the required information to be dynamic +- **Builtin:** A very small set of primitives that are impossible to express in ruby + +The idea is to have different methods for different types, but implementing the same ruby +logic. In contrast to the usual 1-1 relationship between a ruby method and it's binary +definition, there is a 1-n. + +The typed layer defines the Type class and BasicTypes and also lets us return to different +places from a function. By using this, we can +compile a single ruby method into several typed functions. Each such function is typed, ie all +arguments and variables are of known type. According to these types we can call functions according +to their signatures. Also we can autognerate error methods for unhandled types, and predict +that only a fraction of the possible combinations will actually be needed. + + +Just to summarize a few of typed layer features that are maybe unusual: + +- **Message based calling:** Calling is completely object oriented (not stack based) + and uses Message and Frame objects. +- **Return addresses:** A method call may return to several addresses, according + to type, and in case of exception +- **Cross method jumps** When a type switch is detected, a method may jump into the middle + of another method. + + +### Register Machine + +The Register machine layer is a relatively close abstraction of risc hardware, but without the +quirks. + +The register machine has registers, indexed addressing, operators, branches and everything +needed for the next layer. It doesn't not try to abstract every possible machine feature +(like llvm), but rather "objectifies" the risc view to provide what is needed for the typed +layer, the next layer up. + +The machine has it's own (abstract) instruction set, and the mapping to arm is quite +straightforward. Since the instruction set is implemented as derived classes, additional +instructions may be defined and used later, as long as translation is provided for them too. +In other words the instruction set is extensible (unlike cpu instruction sets). + +Basic object oriented concepts are needed already at this level, to be able to generate a whole +self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called +parfait, and the same objects willbe used at runtime and compile time. + +Since working with at this low machine level (essentially assembler) is not easy to follow for +everyone, an interpreter was created. Later a graphical interface, a kind of +[visual debugger](https://github.com/salama/salama-debugger) was added. +Visualizing the control flow and being able to see values updated immediately helped +tremendously in creating this layer. And the interpreter helps in testing, ie keeping it +working in the face of developer change. + + +### Binary , Arm and Elf + +A physical machine will run binaries containing instructions that the cpu understands, in a +format the operating system understands (elf). Arm and elf subdirectories hold the code for +these layers. + +Arm is a risc architecture, but anyone who knows it will attest, with it's own quirks. +For example any instruction may be executed conditionally in arm. Or there is no 32bit +register load instruction. It is possible to create very dense code using all the arm +special features, but this is not implemented yet. + +All Arm instructions are (ie derive from) Register instruction and there is an ArmTranslator +that translates RegisterInstructions to ArmInstructions. diff --git a/typed/parfait.md b/typed/parfait.md index 8689b94..ebdc40c 100644 --- a/typed/parfait.md +++ b/typed/parfait.md @@ -1,49 +1,41 @@ --- layout: typed -title: Parfait, soml's runtime +title: Parfait, a minimal runtime --- -#### Overview - -Soml, like ruby, has open classes. This means that a class can be added to by loading another file -with the same class definition that adds fields or methods. The effect of this is that in designing -the runtime, we can concentrate on a minimal function set. - -This means all the functionality the compiler need to get the job done, mostly class and type -structure related functionality with it's support. - -### Value and Object - -In soml object is not the root of the class hierarchy, but Value is. Integer, Float and Object are -derived from Value. So an integer is *not* an object, but still has a class and methods, just no -instance variables. - ### Type and Class -Each object has a type that describes the instance variables and types of the object. It also -reference the class of the object. Type objects are constant, may not be changed over their -lifetime. When a field is added to a class, a new Type is created. +Each object has a type that describes the instance variables and basic types of the object. +Types also reference the class they implement. +Type objects are unique and constant, may not be changed over their lifetime. +When a field is added to a class, a new Type is created. For a given class and combination +of instance names and basic types, only one instance every exists describing that type (a bit +similar to symbols) -A Class describes a set of objects that respond to the same methods (methods are store in the class). +A Class describes a set of objects that respond to the same methods (the methods source is stored +in the RubyMethod class). A Type describes a set of objects that have the same instance variables. ### Method, Message and Frame -The Method class describes a declared method. It carries a name, argument names and types and -several description of the code. The parsed ast is kept for later inlining, the register model -instruction stream for optimisation and further processing and finally the cpu specific binary +The TypedMethod class describes a callable method. It carries a name, argument and local variable +type and several descriptions of the code. +The typed ast is kept for debugging, the register model instruction stream for optimisation +and further processing and finally the cpu specific binary represents the executable code. -When Methods are invoked, A message object (instance of Message class) is populated. Message objects -are created at compile time and form a linked list. The data in the Message holds the receiver, -return addresses, arguments and a frame. Frames are also created at compile time and just reused -at runtime. +When TypedMethods are invoked, A message object (instance of Message class) is populated. +Message objects are created at compile time and form a linked list. +The data in the Message holds the receiver, return addresses, arguments and a frame. +Frames are also created at compile time and just reused at runtime. ### Space and support -The single instance of Space hold a list of all Classes, which in turn hold the methods. -Also the space holds messages will hold memory management objects like pages. +The single instance of Space hold a list of all Types and all Classes, which in turn hold +the methods. +Also the space holds messages and will hold memory management objects like pages. Words represent short immutable text and other word processing (buffers, text) is still tbd. -Lists are number indexed, starting at one, and dictionaries are mappings from words to objects. + +Lists (aka Array) are number indexed, starting at one, and dictionaries (aka Hash) are mappings from words to objects. diff --git a/typed/typed.md b/typed/typed.md index 13ae18e..26a7f1f 100644 --- a/typed/typed.md +++ b/typed/typed.md @@ -3,70 +3,55 @@ layout: typed title: Typed intermediate representation --- -### Disclaimer +### Intermediate representation -The som Language was a stepping stone: it will go. The basic idea is good and will stay, but the -parser, and thus it's existence as a standalone language, will go. +Compilers use different intermediate representations to go from the source code to a binary, +which would otherwise be too big a step. -What will remain is traditionally called an intermediate representation. Basically the layer into -which the soml compiler compiles to. As such these documents will be rewritten soon. - -#### Top down designed language - -Soml is a language that is designed to be compiled into, rather than written, like -other languages. It is the base for a higher system, -designed for the needs to compile ruby. It is not an endeavor to abstract from a -lower level, like other system languages, namely off course c. - -Still it is a system language, or an object machine language, so almost as low level a -language as possible. Only assembler is really lower, and it could be argued that assembler -is not really a language, rather a data format for expressing binary code. +The **typed** intermediate representation is a strongly typed layer, between the dynamically typed +ruby above, and the register machine below. One can think of it as a mix between c and c++, +minus the syntax aspect. While in 2015, this layer existed as a language, (see soml-parser), it +is now a tree representation only. -##### Object oriented to the core, including calling convention +#### Object oriented to the core, including calling convention -Soml is completely object oriented and strongly typed. Types are modelled as classes and carry -information about instance variable names and their basic type. *Every* object stores a reference -to it's types, and while types are immutable, the reference may change. The basic types every +Types are modeled by the class Type and carry information about instance variable names +and their basic type. *Every object* stores a reference +to it's type, and while **types are immutable**, the reference may change. The basic types every object is made up off, include at least integer and reference (pointer). The object model, ie the basic properties of objects that the system relies on, is quite simple and explained in the runtime section. It involves a single reference per object. -Also the object memory model is kept quite simple in that objects are always small multiples +Also the object memory model is kept quite simple in that object sizes are always small multiples of the cache size of the hardware machine. We use object encapsulation to build up larger looking objects from these basic blocks. The calling convention is also object oriented, not stack based*. Message objects used to define the data needed for invocation. They carry arguments, a frame and return addresses. -In Soml return addresses are pre-calculated and determined by the caller, and yes, there +Return addresses are pre-calculated and determined by the caller, and yes, there are several. In fact there is one return address per basic type, plus one for exception. A method invocation may thus be made to return to an entirely different location than the caller. -\*(A stack, as used in c, is not typed and as such a source of problems) +\*(A stack, as used in c, is not typed, not object oriented, and as such a source of problems) -There is no non- object based memory in soml. The only global constants are instances of -classes that can be accessed by writing the class name in soml source. +There is no non- object based memory at all. The only global constants are instances of +classes that can be accessed by writing the class name in ruby source. -##### Syntax and runtime +#### Runtime / Parfait -Soml syntax is a mix between ruby and c. I is like ruby in the sense that semicolons and even -newlines are not neccessary unless they are. Soml still uses braces, but that will probably -be changed. +The typed representation layer depends on the higher layer to actually determine and instantiate +types (type objects, or objects of class Type). This includes method arguments and local variables. -But off course it is typed, so in argument or variable definitions the type must be specified -like in c. Type names are the class names they represent, but the "int" may be used for brevity -instead of Integer. Return types are also declared, though more for static analysis. As mentioned a -function may return to different addresses according to type. The compiler automatically inserts -errors for return types that are not handled by the caller. -The complete syntax and their translation is discussed [here](syntax.html) +The typed layer is mainly concerned in defining TypedMethods, for which argument or local variable +have specified type (like in c). Basic Type names are the class names they represent, +but the "int" may be used for brevity +instead of Integer. +As mentioned a function may return to different addresses according to type, though this is not +fully implemented. -As soml is the base for dynamic languages, all compile information is recorded in the runtime. -All information is off course object oriented, ie in the form off objects. This means a class -hierarchy, and this itself is off course part of the runtime. The runtime, Parfait, is kept +The runtime, Parfait, is kept to a minimum, currently around 15 classes, described in detail [here](parfait.html). - Historically Parfait has been coded in ruby, as it was first needed in the compiler. This had the additional benefit of providing solid test cases for the functionality. -Currently the process is to convert the code into soml, using the same compiler used to compile -ruby.