rename soml to typed

This commit is contained in:
Torsten Ruger
2016-12-19 17:43:59 +02:00
parent f8adf107fe
commit 1175a8eb97
12 changed files with 21 additions and 21 deletions

BIN
typed/bench.numbers Normal file

Binary file not shown.

BIN
typed/bench.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

54
typed/benchmarks.md Normal file
View File

@ -0,0 +1,54 @@
---
layout: typed
title: Simple soml performance numbers
---
These benchmarks were made to establish places for optimizations. This early on it is clear that
performance is not outstanding, but still there were some surprises.
- loop - program does empty loop of same size as hello
- hello - output hello world (to dev/null) to measure kernel calls (not terminal speed)
- itos - convert integers from 1 to 100000 to string
- add - run integer adds by linear fibonacci of 40
- call - exercise calling by recursive fibonacci of 20
Hello and itos and add run 100_000 iterations per program invocation to remove startup overhead.
Call only has 10000 iterations, as it is much slower, executing about 10000 calls per invocation
Gcc used to compile c on the machine. soml executables produced by ruby (on another machine)
### Results
Results were measured by a ruby script. Mean and variance was measured until variance was low,
always under one percent.
The machine was a virtual arm run on a powerbook, performance roughly equivalent to a raspberry pi.
But results should be seen as relative, not absolute (some were scaled)
![Graph](bench.png)
### Discussion
Surprisingly there are areas where soml code runs faster than c. Especially in the hello example this
may not mean too much. Printf does caching and has a lot functionality, so it may not be a straight
comparison. The loop example is surprising and needs to be examined.
The add example is slower because of the different memory model and lack of optimisation for soml.
Every result of an arithmetic operation is immediately written to memory in soml, whereas c will
keep things in registers as long as it can, which in the example is the whole time. This can
be improved upon with register code optimisation, which can cut loads after writes and writes that
that are overwritten before calls or jumps are made.
The call was expected to be larger as a typed model is used and runtime information (like the method
name) made available. It is actually a small price to pay for the ability to generate code at runtime
and will off course reduce drastically with inlining.
The itos example was also to be expected as it relies both on calling and on arithmetic. Also itos
relies heavily on division by 10, which when coded in cpu specific assembler may easily be sped up
by a factor of 2-3.
All in all the results are encouraging as no optimization efforts have been made. Off course the
most encouraging fact is that the system works and thus may be used as the basis of a dynamic
code generator, as opposed to having to interpret.

96
typed/debugger.md Normal file
View File

@ -0,0 +1,96 @@
---
layout: typed
title: Register Level Debugger / simulator
---
![Debugger](https://raw.githubusercontent.com/salama/salama-debugger/master/static/debugger.png)
## Views
From left to right there are several views showing different data and controls.
All of the green boxes are in fact pop-up menus and can show more information.
Most of these are implemented as a single class with the name reflecting what part.
I wrote 2 base classes that handle element generation (ie there is hardly any html involved, just elements)
### Switch view
Top left at the top is a little control to switch files.
The files need to be in the repository, but at least one can have several and switch between
them without stopping the debugger.
Parsing is the only thing that opal chokes on, so the files are parsed by a server script and the
ast is sent to the browser.
### Classes View
The first column on the left is a list of classes in the system. Like on all boxes one can hover
over a name to look at the class and it's instance variables (recursively)
### Source View
Next is a view of the Soml source. The Source is reconstructed from the ast as html.
Soml (Salama object machine language) is is a statically typed language,
maybe in spirit close to c++ (without the c). In the future Salama will compile ruby to soml.
While stepping through the code, those parts of the code that are active get highlighted in blue.
Currently stepping is done only in register instructions, which means that depending on the source
constructs it may take many steps for the cursor to move on.
Each step will show progress on the register level though (next view)
### Register Instruction view
Salama defines a register machine level which is quite close to the arm machine, but with more
sensible names. It has 16 registers (below) and an instruction set that is useful for Soml.
Data movement related instruction implement an indexed get and set. There is also Constant load and
integer operators and off course branches.
Instructions print their name and used registers r0-r15.
The next instruction to be executed is highlighted in blue. A list of previous instructions is shown.
One can follow the effect of instruction in the register view below.
### Status View
The last view at the top right show the status of the machine (interpreter to be precise), the
instruction count and any stdout
Current controls include stepping and three speeds of running the program.
- Next (green button) will execute exactly one instruction when clicked. Mostly useful when
debugging the compiler, ie inspecting the generated code.
- Crawl (first blue button) will execute at a moderate speed. One can still follow the
logic at the register level
- Run (second blue button) runs the program at a higher speed where register instruction just
whizz by, but one can still follow the source view. Mainly used to verify that the source executes
as expected and also to get to a specific place in the program (in the absence of breakpoints)
- Wizz (third blue button) makes the program run so fast that it's only useful function is to
fast forward in the code (while debugging)
### Register view
The bottom part of the screen is taken up by the 16 register. As we execute an object oriented
language, we show the object contents if it is an object (not an integer) in a register.
The (virtual) machine only uses objects, and specifically a linked list of Message objects to
make calls. The current message is always in register 0 (analgous to a stack pointer).
All other registers are scratch for statement use.
In Soml expressions compile to the register that holds the expressions value and statements may use
all registers and may not rely on anything other than the message in register 0.
The Register view is now greatly improved, especially in it's dynamic features:
- when the contents update the register obviously updates
- when the object that the register holds updates, the new value is shown immediately
- hovering over a variable will **expand that variable** .
- the hovering works recursively, so it is possible to drill down into objects for several levels
The last feature of inspecting objects is show in the screenshot. This makes it possible
to very quickly verify the programs behaviour. As it is a pure object system , all data is in
objects, and all objects can be inspected.

49
typed/parfait.md Normal file
View File

@ -0,0 +1,49 @@
---
layout: typed
title: Parfait, soml's runtime
---
#### Overview
Soml, like ruby, has open classes. This means that a class can be added to by loading another file
with the same class definition that adds fields or methods. The effect of this is that in designing
the runtime, we can concentrate on a minimal function set.
This means all the functionality the compiler need to get the job done, mostly class and type
structure related functionality with it's support.
### Value and Object
In soml object is not the root of the class hierarchy, but Value is. Integer, Float and Object are
derived from Value. So an integer is *not* an object, but still has a class and methods, just no
instance variables.
### Type and Class
Each object has a type that describes the instance variables and types of the object. It also
reference the class of the object. Type objects are constant, may not be changed over their
lifetime. When a field is added to a class, a new Type is created.
A Class describes a set of objects that respond to the same methods (methods are store in the class).
A Type describes a set of objects that have the same instance variables.
### Method, Message and Frame
The Method class describes a declared method. It carries a name, argument names and types and
several description of the code. The parsed ast is kept for later inlining, the register model
instruction stream for optimisation and further processing and finally the cpu specific binary
represents the executable code.
When Methods are invoked, A message object (instance of Message class) is populated. Message objects
are created at compile time and form a linked list. The data in the Message holds the receiver,
return addresses, arguments and a frame. Frames are also created at compile time and just reused
at runtime.
### Space and support
The single instance of Space hold a list of all Classes, which in turn hold the methods.
Also the space holds messages will hold memory management objects like pages.
Words represent short immutable text and other word processing (buffers, text) is still tbd.
Lists are number indexed, starting at one, and dictionaries are mappings from words to objects.

148
typed/syntax.md Normal file
View File

@ -0,0 +1,148 @@
---
layout: typed
title: Soml Syntax
---
#### Top level Class and methods
The top level declarations in a file may only be class definitions
class Dictionary < Object
int add(Object o)
... statements
end
end
The class hierarchy is explained in [here](parfait.html), but you can leave out the superclass
and Object will be assumed.
Methods must be typed, both arguments and return. Generally class names serve as types, but "int" can
be used as a shortcut for Integer.
Code may not be outside method definitions, like in ruby. A compiled program starts at the builtin
method __init__, that does the initial setup, an then jumps to **Space.main**
Classes are represented by class objects (instances of class Class to be precise) and methods by
Method objects, so all information is available at runtime.
#### Expressions
Soml distinguishes between expressions and statements. Expressions have value, statements perform an
action. Both are compiled to Register level instructions for the current method. Generally speaking
expressions store their value in a register and statements store those values elsewhere, possibly
after operating on them.
The subsections below correspond roughly to the parsers rule names.
**Basic expressions** are numbers (integer or float), strings or names, either variable, argument,
field or class names. (normal details applicable). Special names include self (the current
receiver), and message (the currently executed method frame). These all resolve to a register
with contents.
23
"hi there"
argument_name
Object
A **field access** resolves to the fields value at the time. Fields must be defined by
field definitions, and are basically instance variables, but not hidden (see below).
The example below shows how to define local variables at the same time. Notice chaining, both for
field access and call, is not allowed.
Type l = self.type
Class c = l.object_class
Word n = c.name
A **Call expression** is a method call that resolves to the methods return value. If no receiver is
specified, self (the current receiver) is used. The receiver may be any of the basic expressions
above, so also class instances. The receiver type is known at compile time, as are all argument
types, so the class of the receiver is searched for a matching method. Many methods of the same
name may exist, but to issue a call, an exact match for the arguments must be found.
Class c = self.get_class()
c.get_super_class()
An **operator expression** is a binary expression, with either of the other expressions as left
and right operand, and an operator symbol between them. Operand types must be integer.
The symbols allowed are normal arithmetic and logical operations.
a + b
counter | 255
mask >> shift
Operator expressions may be used in assignments and conditions, but not in calls, where the result
would have to be assigned beforehand. This is one of those cases where soml's low level approach
shines through, as soml has no auto-generated temporary variables.
#### Statements
We have seen the top level statements above. In methods the most interesting statements relate to
flow control and specifically how conditionals are expressed. This differs somewhat from other
languages, in that the condition is expressed explicitly (not implicitly like in c or ruby).
This lets the programmer express more precisely what is tested, and also opens an extensible
framework for more tests than available in other languages. Specifically overflow may be tested in
soml, without dropping down to assembler.
An **if statement** is started with the keyword if_ and then contains the branch type. The branch
type may be *plus, minus, zero, nonzero or overflow*. The condition must be in brackets and can be
any expression. *If* may be continued with en *else*, but doesn't have to be, and is ended with *end*
if_zero(a - 5)
....
else
....
end
A **while statement** is very much like an if, with off course the normal loop semantics, and
without the possible else.
while_plus( counter )
....
end
A **return statement** return a value from the current functions. There are no void functions.
return 5
A **field definition** is to declare an instance variable on an object. It starts with the keyword
field, must be in class (not method) scope and may not be assigned to.
class Class < Object
field List instance_methods
field Type object_type
field Word name
...
end
A **local variable definition** declares, and possibly assigns to, a local variable. Local variables
are stored in frame objects, in fact they are instance variables of the current frame object.
When resolving a name, the compiler checks argument names first, and then local variables.
int counter = 0
Any of the expressions may be assigned to the variable at the time of definition. After a variable is
defined it may be assigned to with an **assignment statement** any number of times. The assignment
is like an assignment during definition, without the leading type.
counter = 0
Any of the expressions, basic, call, operator, field access, may be assigned.
### Code generation and scope
Compiling generates two results simultaneously. The more obvious is code for a function, but also an
object structure of classes etc that capture the declarations. To understand the code part better
the register abstraction should be studied, and to understand the object structure the runtime.
The register machine abstraction is very simple, and so is the code generation, in favour of a simple
model. Especially in the area of register assignment, there is no magic and only a few simple rules.
The main one of those concerns main memory access ordering and states that object memory must
be consistent at the end of the statement. Since there is only only object memory in soml, this
concerns all assignments, since all variables are either named or indexed members of objects.
Also local variables are just members of the frame.
This obviously does leave room for optimisations as preliminary benchmarks show. But benchmarks also
show that it is not such a bit issue and much more benefit can be achieved by inlining.

72
typed/typed.md Normal file
View File

@ -0,0 +1,72 @@
---
layout: typed
title: Typed intermediate representation
---
### Disclaimer
The som Language was a stepping stone: it will go. The basic idea is good and will stay, but the
parser, and thus it's existence as a standalone language, will go.
What will remain is traditionally called an intermediate representation. Basically the layer into
which the soml compiler compiles to. As such these documents will be rewritten soon.
#### Top down designed language
Soml is a language that is designed to be compiled into, rather than written, like
other languages. It is the base for a higher system,
designed for the needs to compile ruby. It is not an endeavor to abstract from a
lower level, like other system languages, namely off course c.
Still it is a system language, or an object machine language, so almost as low level a
language as possible. Only assembler is really lower, and it could be argued that assembler
is not really a language, rather a data format for expressing binary code.
##### Object oriented to the core, including calling convention
Soml is completely object oriented and strongly typed. Types are modelled as classes and carry
information about instance variable names and their basic type. *Every* object stores a reference
to it's types, and while types are immutable, the reference may change. The basic types every
object is made up off, include at least integer and reference (pointer).
The object model, ie the basic properties of objects that the system relies on, is quite simple
and explained in the runtime section. It involves a single reference per object.
Also the object memory model is kept quite simple in that objects are always small multiples
of the cache size of the hardware machine.
We use object encapsulation to build up larger looking objects from these basic blocks.
The calling convention is also object oriented, not stack based*. Message objects used to
define the data needed for invocation. They carry arguments, a frame and return addresses.
In Soml return addresses are pre-calculated and determined by the caller, and yes, there
are several. In fact there is one return address per basic type, plus one for exception.
A method invocation may thus be made to return to an entirely different location than the
caller.
\*(A stack, as used in c, is not typed and as such a source of problems)
There is no non- object based memory in soml. The only global constants are instances of
classes that can be accessed by writing the class name in soml source.
##### Syntax and runtime
Soml syntax is a mix between ruby and c. I is like ruby in the sense that semicolons and even
newlines are not neccessary unless they are. Soml still uses braces, but that will probably
be changed.
But off course it is typed, so in argument or variable definitions the type must be specified
like in c. Type names are the class names they represent, but the "int" may be used for brevity
instead of Integer. Return types are also declared, though more for static analysis. As mentioned a
function may return to different addresses according to type. The compiler automatically inserts
errors for return types that are not handled by the caller.
The complete syntax and their translation is discussed [here](syntax.html)
As soml is the base for dynamic languages, all compile information is recorded in the runtime.
All information is off course object oriented, ie in the form off objects. This means a class
hierarchy, and this itself is off course part of the runtime. The runtime, Parfait, is kept
to a minimum, currently around 15 classes, described in detail [here](parfait.html).
Historically Parfait has been coded in ruby, as it was first needed in the compiler.
This had the additional benefit of providing solid test cases for the functionality.
Currently the process is to convert the code into soml, using the same compiler used to compile
ruby.