small fixes
This commit is contained in:
parent
48e4c8a27d
commit
de5f4c85cb
53
soml/soml.md
53
soml/soml.md
@ -9,28 +9,30 @@ title: Salama object machine language
|
||||
Soml is a language that is designed to be compiled into, rather than written, like
|
||||
other languages. It is the base for a higher system,
|
||||
designed for the needs to compile ruby. It is not an endeavour to abstract from a
|
||||
lower level, like other system languages, namely off course c.<br/>
|
||||
lower level, like other system languages, namely off course c.
|
||||
|
||||
Still it is a system language, or an object machine language, so almost as low level a
|
||||
language as possible. Only assembler is really lower, and it could be argued that assembler
|
||||
is not really a language, rather a data format for expressing binary code. <br/>
|
||||
is not really a language, rather a data format for expressing binary code.
|
||||
|
||||
|
||||
##### Object oriented to the core, including calling convention
|
||||
|
||||
Soml is completely object oriented and strongly typed. For types, the classes are used, but
|
||||
the main distinction is between object (references) and integers. This is off course
|
||||
essential as dereferencing integers is what we want to avoid.
|
||||
Soml is completely object oriented and strongly typed. Types are modelled as classes and carry
|
||||
information about instance variable names and their basic type. *Every* object stores a reference
|
||||
to it's types, and while types are immutable, the reference may change. The basic types every
|
||||
object is made up off, include at least integer and reference (pointer).
|
||||
|
||||
The object model, ie the basic properties of objects that the system relies on, is quite simple
|
||||
and explained in the runtime section. It involves a single reference per object. <br/>
|
||||
Also the object memory
|
||||
model is kept quite simple in that objects are always small multiples of the cache size of the
|
||||
hardware machine. We use object encapsulation to build up larger looking objects from these
|
||||
basic blocks.
|
||||
and explained in the runtime section. It involves a single reference per object.
|
||||
Also the object memory model is kept quite simple in that objects are always small multiples
|
||||
of the cache size of the hardware machine.
|
||||
We use object encapsulation to build up larger looking objects from these basic blocks.
|
||||
|
||||
The calling convention is also object oriented, not stack based*. Message objects used to
|
||||
define the data needed for invocation. They carry arguments, a frame and return addresses.
|
||||
In Soml return addresses are pre-calculated and determined by the caller, and yes, there
|
||||
are several. In fact there is one return address per masic type, plus one for exception.
|
||||
are several. In fact there is one return address per basic type, plus one for exception.
|
||||
A method invocation may thus be made to return to an entirely different location than the
|
||||
caller.
|
||||
\*(A stack, as used in c, is not typed and as such a source of problems)
|
||||
@ -41,22 +43,23 @@ classes that can be accessed by writing the class name in soml source.
|
||||
##### Syntax and runtime
|
||||
|
||||
Soml syntax is a mix between ruby and c. I is like ruby in the sense that semicolons and even
|
||||
newlines are not neccessary unless they are. It still uses braces, but that will probably
|
||||
be changed. <br/>
|
||||
newlines are not neccessary unless they are. Soml still uses braces, but that will probably
|
||||
be changed.
|
||||
|
||||
But off course it is typed, so in argument or variable definitions the type must be specified
|
||||
like in c. Types are classes, but int may be used for brevity instead of Integer. Return
|
||||
types are also declared, though more for statci analysis. As mentioned any function may return
|
||||
to differernt addresses according to type. The compiler automatically inserts erros for
|
||||
return typesa that are not handled by the caller. <br/>
|
||||
The complete syntax and their translation is discussed <a href="syntax.html"> here </a>
|
||||
like in c. Type names are the class names they represent, but the "int" may be used for brevity
|
||||
instead of Integer. Return types are also declared, though more for static analysis. As mentioned a
|
||||
function may return to different addresses according to type. The compiler automatically inserts
|
||||
errors for return types that are not handled by the caller.
|
||||
The complete syntax and their translation is discussed [here](syntax.html)
|
||||
|
||||
As soml is the base for dynamic languages, all compile information is recorded in the runtime.
|
||||
All inforamtion is off course object oriented, ie in the form off objects. This means a class
|
||||
hierachy and this itself is off course part of the runtime. The runtime, Parfait, is kept
|
||||
to a minnimum, currently around 15 classes, described in detail <a href="parfait.html">
|
||||
here </a>. <br/>
|
||||
All information is off course object oriented, ie in the form off objects. This means a class
|
||||
hierarchy, and this itself is off course part of the runtime. The runtime, Parfait, is kept
|
||||
to a minimum, currently around 15 classes, described in detail [here](parfait.html).
|
||||
|
||||
|
||||
Historically Parfait has been coded in ruby, as it was first needed in the compiler.
|
||||
This had the additional benefit of providing solid test cases for the functionality.
|
||||
Currently the process is to recode the same functionality in soml, and by the end of that
|
||||
a converter will be written. This will convert the soml code into ruby code, thus removing the
|
||||
duplication.
|
||||
Currently the process is to convert the code into soml, using the same compiler used to compile
|
||||
ruby.
|
||||
|
@ -14,17 +14,17 @@ The top level declarations in a file may only be class definitions
|
||||
end
|
||||
end
|
||||
|
||||
The class hierarchy is explained in [here](./parfait.html), but you can leave out the superclass
|
||||
The class hierarchy is explained in [here](parfait.html), but you can leave out the superclass
|
||||
and Object will be assumed.
|
||||
|
||||
Methods must be typed, both arguments and return. Generally class names serve as types, but int can
|
||||
Methods must be typed, both arguments and return. Generally class names serve as types, but "int" can
|
||||
be used as a shortcut for Integer.
|
||||
|
||||
Code may not be outside method definitions, like in ruby. A compiled program starts at the builtin
|
||||
method __init__, that does the inital setup, an then jumps to Object.main
|
||||
method __init__, that does the initial setup, an then jumps to **Space.main**
|
||||
|
||||
Classes are represented by class objects and methods my Method objects, so all information is available
|
||||
at runtime.
|
||||
Classes are represented by class objects (instances of class Class to be precise) and methods by
|
||||
Method objects, so all information is available at runtime.
|
||||
|
||||
#### Expressions
|
||||
|
||||
@ -33,6 +33,8 @@ action. Both are compiled to Register level instructions for the current method.
|
||||
expressions store their value in a register and statements store those values elsewhere, possibly
|
||||
after operating on them.
|
||||
|
||||
The subsections below correspond roughly to the parsers rule names.
|
||||
|
||||
**Basic expressions** are numbers (integer or float), strings or names, either variable, argument,
|
||||
field or class names. (normal details applicable). Special names include self (the current
|
||||
receiver), and message (the currently executed method frame). These all resolve to a register
|
||||
@ -82,9 +84,9 @@ This lets the programmer express more precisely what is tested, and also opens a
|
||||
framework for more tests than available in other languages. Specifically overflow may be tested in
|
||||
soml, without dropping down to assembler.
|
||||
|
||||
And **if statement** is started with the keyword if_ and then contains the branch type. The branch
|
||||
type may be plus, minus, zero, nonzero or overflow. The condition must be in brackets and be any
|
||||
expression. If may be continued with en else, but doesn't have to be, and is ended with end
|
||||
An **if statement** is started with the keyword if_ and then contains the branch type. The branch
|
||||
type may be *plus, minus, zero, nonzero or overflow*. The condition must be in brackets and can be
|
||||
any expression. *If* may be continued with en *else*, but doesn't have to be, and is ended with *end*
|
||||
|
||||
if_zero(a - 5)
|
||||
....
|
||||
@ -114,14 +116,14 @@ field, must be in class (not method) scope and may not be assigned to.
|
||||
...
|
||||
end
|
||||
|
||||
A **local variable definition** declares and possibly assign to a local variable. Local variables
|
||||
are store in frame objects and the are last in search order. When resolving a name, the compiler
|
||||
checks argument names first, and then local variables.
|
||||
A **local variable definition** declares, and possibly assigns to, a local variable. Local variables
|
||||
are stored in frame objects, in fact they are instance variables of the current frame object.
|
||||
When resolving a name, the compiler checks argument names first, and then local variables.
|
||||
|
||||
int counter = 0
|
||||
|
||||
Any of the expression may be assigned to the variable at the time of definition. After a variable is
|
||||
defined it may be assigned to with an **assignemnt statement** any number of times. The assignment
|
||||
Any of the expressions may be assigned to the variable at the time of definition. After a variable is
|
||||
defined it may be assigned to with an **assignment statement** any number of times. The assignment
|
||||
is like an assignment during definition, without the leading type.
|
||||
|
||||
counter = 0
|
||||
@ -130,7 +132,7 @@ Any of the expressions, basic, call, operator, field access, may be assigned.
|
||||
|
||||
### Code generation and scope
|
||||
|
||||
Compiling generates two results simultaneously. The more obvious code for a function, but also an
|
||||
Compiling generates two results simultaneously. The more obvious is code for a function, but also an
|
||||
object structure of classes etc that capture the declarations. To understand the code part better
|
||||
the register abstraction should be studied, and to understand the object structure the runtime.
|
||||
|
||||
@ -142,5 +144,5 @@ be consistent at the end of the statement. Since there is only only object memor
|
||||
concerns all assignments, since all variables are either named or indexed members of objects.
|
||||
Also local variables are just members of the frame.
|
||||
|
||||
This obviously does leave room for optimisation as preliminary benchmarks show. But benchmarks also
|
||||
This obviously does leave room for optimisations as preliminary benchmarks show. But benchmarks also
|
||||
show that it is not such a bit issue and much more benefit can be achieved by inlining.
|
||||
|
Loading…
x
Reference in New Issue
Block a user