group the old and new ideas into misc

typed (file) still needs work, but at least the typed directory is gone
2018-04-11 20:53:49 +03:00
parent 4dcfddb270
commit a42ca6e514
12 changed files with 80 additions and 91 deletions
--- a/app/views/pages/misc/_menu.html.haml
+++ b/app/views/pages/misc/_menu.html.haml
@ -0,0 +1,5 @@
+.row
+  %ul.nav
+    %li= link_to "Threads (rumble)" , "threads.html"
+    %li= link_to "Optimisation (ideas)" , "optimisations.html"
+    %li= link_to "SOML (old)",  "soml.html"
--- a/app/views/pages/misc/bench.numbers
+++ b/app/views/pages/misc/bench.numbers
--- a/app/views/pages/misc/index.html.haml
+++ b/app/views/pages/misc/index.html.haml
@ -0,0 +1,12 @@
+= render "pages/misc/menu"
+
+%h1= title "Ideas, old and new"
+
+An idea is where everything starts. Like this project. But not every idea
+makes it to reality, and some that do do not turn out as planned.
+
+This is a collection of both, the overripe and unripe. Maybe we can learn from the
+first, and by continuously reexamining the second see if it is worth the
+effort to bring them down to earth.
+
+%h2 Unrealized
--- a/app/views/pages/misc/optimisations.html.haml
+++ b/app/views/pages/misc/optimisations.html.haml
@ -0,0 +1,33 @@
+= render "pages/misc/menu"
+
+%h1=title "Optimisation ideas"
+
+%p I won’t manage to implement all of these idea in the beginning, so i just jot them down.
+
+%h3 Inlining
+%p
+  Ok, this may not need too much explanation. Just work. It may be interesting to experiment how much this saves, and how much
+  inlining is useful. I could imagine at some point it’s the register shuffling that determines the effort, not the
+  actual call.
+%p Again the key is the update notifications when some of the inlined functions have changed.
+%p
+  And it is important to code the functions so that they have a single exit point, otherwise it gets messy. Up to now this
+  was quite simple, but then blocks and exceptions are undone.
+
+%h3 Register negotiation
+%p
+  This is a little less baked, but it comes from the same idea as inlining. As calling functions is a lot of register
+  shuffling, we could try to avoid some of that.
+%p More precisely, usually calling conventions have registers in which arguments are passed. And to call an “unknown”, ie any function, some kind of convention is neccessary.
+%p
+  But on “cached” functions, where the function is know, it is possible to do something else. And since we have the source
+  of the function around, we can do things previously impossible.
+%p One such thing may be to recompile the function to accept arguments exactly where they are in the calling function. Well, now that it’s written down. it does sound a lot like inlining, except without the inlining:-)
+%p
+  An expansion if this idea would be to have a Negotiator on every function call. Meaning that the calling function would not
+  do any shuffling, but instead call a Negotiator, and the Negotiator does the shuffling and calling of the function.
+  This only really makes sense if the register shuffling information is encoded in the Negotiator object (and does not have
+  to be passed).
+%p
+  Negotiators could do some counting and do the recompiling when it seems worth it. The Negotiator would remove itself from
+  the chain and connect called and new receiver directly. How much is in this i couldn’t say though.
--- a/app/views/pages/misc/soml.html.haml
+++ b/app/views/pages/misc/soml.html.haml
@ -0,0 +1,210 @@
+= render "pages/misc/menu"
+
+%h1= title "Soml Syntax"
+
+Soml was a step on the way. While i was still thinking about VM's and c++ ,
+the year was 2015.
+%br
+I had thought that some kind of typed layer was needed, and that one could
+implement the higher level in a language. A typed language, sort of like c++,
+that would be used to implement the code of ruby.
+%br
+A little like PyPy has a core that only uses a small set of python, that can,
+a la crystal, be fully type inferred.
+%br
+This idea turned out to be wrong, or difficult. Because the bridge between the
+typed an untyped was unclear or i didn't get it to work. The current system
+works by typing self, but not it's instances. A difficult mix to express in a
+language.
+%br
+This led me to abandon soml and rewrite the functionality as vool and mom layers.
+But Soml was working, the parser is still about and there are some interesting
+=link_to "benchmarks" , "soml_benchmarks.html"
+that came from it and really validated the calling convention.
+
+%h2 Top level Class and methods
+%p The top level declarations in a file may only be class definitions
+%pre
+  %code
+    :preserve
+      class Dictionary < Object
+        int add(Object o)
+          ... statements
+        end
+      end
+%p
+  The class hierarchy is explained in
+  = succeed "," do
+    %a{:href => "parfait.html"} here
+%p
+  Methods must be typed, both arguments and return. Generally class names serve as types, but “int” can
+  be used as a shortcut for Integer.
+%p
+  Code may not be outside method definitions, like in ruby. A compiled program starts at the builtin
+  method
+  = succeed "," do
+    %strong init
+  %strong Space.main
+%p
+  Classes are represented by class objects (instances of class Class to be precise) and methods by
+  Method objects, so all information is available at runtime.
+%h2 Expressions
+%p
+  Soml distinguishes between expressions and statements. Expressions have value, statements perform an
+  action. Both are compiled to Register level instructions for the current method. Generally speaking
+  expressions store their value in a register and statements store those values elsewhere, possibly
+  after operating on them.
+%p The subsections below correspond roughly to the parsers rule names.
+%p
+  %strong Basic expressions
+  are numbers (integer or float), strings or names, either variable, argument,
+  field or class names. (normal details applicable). Special names include self (the current
+  receiver), and message (the currently executed method frame). These all resolve to a register
+  with contents.
+%pre
+  %code
+    :preserve
+        23
+        "hi there"
+        argument_name
+        Object
+%p
+  A
+  %strong field access
+  resolves to the fields value at the time. Fields must be defined by
+  field definitions, and are basically instance variables, but not hidden (see below).
+  The example below shows how to define local variables at the same time. Notice chaining, both for
+  field access and call, is not allowed.
+%pre
+  %code
+    :preserve
+        Type l = self.type
+        Class  c = l.object_class
+        Word   n = c.name
+%p
+  A
+  %strong Call expression
+  is a method call that resolves to the methods return value. If no receiver is
+  specified, self (the current receiver) is used. The receiver may be any of the basic expressions
+  above, so also class instances. The receiver type is known at compile time, as are all argument
+  types, so the class of the receiver is searched for a matching method. Many methods of the same
+  name may exist, but to issue a call, an exact match for the arguments must be found.
+%pre
+  %code
+    :preserve
+        Class c = self.get_class()
+        c.get_super_class()
+%p
+  An
+  %strong operator expression
+  is a binary expression, with either of the other expressions as left
+  and right operand, and an operator symbol between them. Operand types must be integer.
+  The symbols allowed are normal arithmetic and logical operations.
+%pre
+  %code
+    :preserve
+         a + b
+         counter | 255
+         mask >> shift
+%p
+  Operator expressions may be used in assignments and conditions, but not in calls, where the result
+  would have to be assigned beforehand. This is one of those cases where soml’s low level approach
+  shines through, as soml has no auto-generated temporary variables.
+%h2 Statements
+%p
+  We have seen the top level statements above. In methods the most interesting statements relate to
+  flow control and specifically how conditionals are expressed. This differs somewhat from other
+  languages, in that the condition is expressed explicitly (not implicitly like in c or ruby).
+  This lets the programmer express more precisely what is tested, and also opens an extensible
+  framework for more tests than available in other languages. Specifically overflow may be tested in
+  soml, without dropping down to assembler.
+%p
+  An
+  %strong if statement
+  is started with the keyword if_ and then contains the branch type. The branch
+  type may be
+  = succeed "." do
+    %em plus, minus, zero, nonzero or overflow
+  %em If
+  may be continued with en
+  = succeed "," do
+    %em else
+  %em end
+%pre
+  %code
+    :preserve
+        if_zero(a - 5)
+          ....
+        else
+          ....
+        end
+%p
+  A
+  %strong while statement
+  is very much like an if, with off course the normal loop semantics, and
+  without the possible else.
+%pre
+  %code
+    :preserve
+        while_plus( counter )
+          ....
+        end
+%p
+  A
+  %strong return statement
+  return a value from the current functions. There are no void functions.
+%pre
+  %code
+    :preserve
+      return 5
+%p
+  A
+  %strong field definition
+  is to declare an instance variable on an object. It starts with the keyword
+  field, must be in class (not method) scope and may not be assigned to.
+%pre
+  %code
+    :preserve
+      class Class < Object
+        field List instance_methods
+        field Type object_type
+        field Word name
+        ...
+      end
+%p
+  A
+  %strong local variable definition
+  declares, and possibly assigns to, a local variable. Local variables
+  are stored in frame objects, in fact they are instance variables of the current frame object.
+  When resolving a name, the compiler checks argument names first, and then local variables.
+%pre
+  %code
+    :preserve
+      int counter = 0
+%p
+  Any of the expressions may be assigned to the variable at the time of definition. After a variable is
+  defined it may be assigned to with an
+  %strong assignment statement
+  any number of times. The assignment
+  is like an assignment during definition, without the leading type.
+%pre
+  %code
+    :preserve
+        counter = 0
+%p Any of the expressions, basic, call, operator, field access, may be assigned.
+%h2 Code generation and scope
+%p
+  Compiling generates two results simultaneously. The more obvious is code for a function, but also an
+  object structure of classes etc that capture the declarations. To understand the code part better
+  the register abstraction should be studied, and to understand the object structure the runtime.
+%p
+  The register machine abstraction is very simple, and so is the code generation, in favour of a simple
+  model. Especially in the area of register assignment, there is no magic and only a few simple rules.
+%p
+  The main one of those concerns main memory access ordering and states that object memory must
+  be consistent at the end of the statement. Since there is only only object memory in soml, this
+  concerns all assignments, since all variables are either named or indexed members of objects.
+  Also local variables are just members of the frame.
+%p
+  This obviously does leave room for optimisations as preliminary benchmarks show. But benchmarks also
+  show that it is not such a bit issue and much more benefit can be achieved by inlining.
--- a/app/views/pages/misc/soml_benchmarks.html.haml
+++ b/app/views/pages/misc/soml_benchmarks.html.haml
@ -0,0 +1,55 @@
+= render "pages/misc/menu"
+
+Disclaimer: read the intro of
+=link_to "Soml" , "soml.html"
+first. This is obsolete, and has only historic value.
+
+
+%h1= title "Simple soml performance numbers"
+%p
+  These benchmarks were made to establish places for optimizations. This early on it is clear that
+  performance is not outstanding, but still there were some surprises.
+%ul
+  %li loop  - program does empty loop of same size as hello
+  %li hello - output hello world (to dev/null) to measure kernel calls (not terminal speed)
+  %li itos  - convert integers from 1 to 100000  to string
+  %li add   - run integer adds by linear fibonacci of 40
+  %li call  - exercise calling by recursive fibonacci of 20
+%p
+  Hello and itos and add run 100_000 iterations per program invocation to remove startup overhead.
+  Call only has 10000 iterations, as it is much slower, executing about 10000 calls per invocation
+%p Gcc used to compile c on the machine. soml executables produced by ruby (on another machine)
+
+%h2 Results
+%p
+  Results were measured by a ruby script. Mean and variance was measured until variance was low,
+  always under one percent.
+%p
+  The machine was a virtual arm run on a powerbook, performance roughly equivalent to a raspberry pi.
+  But results should be seen as relative, not absolute (some were scaled)
+%p
+  = image_tag "typed/bench.png" ,alt: "Graph"
+
+%h2 Discussion
+%p
+  Surprisingly there are areas where soml code runs faster than c. Especially in the hello example this
+  may not mean too much. Printf does caching and has a lot functionality, so it may not be a straight
+  comparison. The loop example is surprising and needs to be examined.
+%p
+  The add example is slower because of the different memory model and lack of optimisation for soml.
+  Every result of an arithmetic operation is immediately written to memory in soml, whereas c will
+  keep things in registers as long as it can, which in the example is the whole time. This can
+  be improved upon with register code optimisation, which can cut loads after writes and writes that
+  that are overwritten before calls or jumps are made.
+%p
+  The call was expected to be larger as a typed model is used and runtime information (like the method
+  name) made available. It is actually a small price to pay for the ability to generate code at runtime
+  and will off course reduce drastically with inlining.
+%p
+  The itos example was also to be expected as it relies both on calling and on arithmetic. Also itos
+  relies heavily on division by 10, which when coded in cpu specific assembler may easily be sped up
+  by a factor of 2-3.
+%p
+  All in all the results are encouraging as no optimization efforts have been made. Off course the
+  most encouraging fact is that the system works and thus may be used as the basis of a dynamic
+  code generator, as opposed to having to interpret.
--- a/app/views/pages/misc/threads.html.haml
+++ b/app/views/pages/misc/threads.html.haml
@ -0,0 +1,70 @@
+= render "pages/misc/menu"
+
+%h1= title "Threads are broken"
+
+%p
+  Having just read about rubys threads, i was moved to collect my thoughts on the topic. How this will influence implementation
+  i am not sure yet. But good to get it out on paper as a basis for communication.
+%h3#processes Processes
+%p
+  I find it helps to consider why we have threads. Before threads, unix had only processes and ipc,
+  so inter-process-communication.
+%p
+  Processes were a good idea, keeping each programm save from the mistakes of others by restricting access to the processes
+  own memory. Each process had the view of “owning” the machine, being alone on the machine as it were. Each a small turing/
+  von neumann machine.
+%p
+  But one had to wait for io, the network and so it was difficult, or even impossible to get one process to use the machine
+  to the hilt.
+%p
+  IPC mechnisms were and are sockets, shared memory regions, files, each with their own sets of strengths, weaknesses and
+  api’s, all deemed complicated and slow. Each switch encurs a process switch and processes are not lightweight structures.
+%h3#thread Thread
+%p
+  And so threads were born as a lightweight mechanisms of getting more things done. Concurrently, because when the one
+  thread is in a kernel call, it is suspended.
+%h4#green-or-fibre Green or fibre
+%p
+  The first threads that people did without kernel support, were quickly found not to solve the problem so well. Because as any
+  thread is calling the kernel, all threads stop. Not really that much won one might think, but wrongly.
+%p
+  Now that Green threads are coming back in fashion as fibres they are used for lightweight concurrency, actor programming and
+  we find that the different viewpoint can help to express some solutions more naturally.
+%h4#kernel-threads Kernel threads
+%p
+  The real solution, where the kernel knows about threads and does the scheduling, took some while to become standard and
+  makes processes more complicated a fair degree. Luckily we don’t code kernels and don’t have to worry.
+%p
+  But we do have to deal with the issues that come up. The isse is off course data corruption. I don’t even want to go into
+  how to fix this, or the different ways that have been introduced, because the main thrust becomes clear in the next chapter:
+%h3#broken-model Broken model
+%p
+  My main point about threads is that they are one of the worse hacks, especially in a c environemnt. Processes had a good
+  model of a programm with a global memory. The equivalent of threads would have been shared memory with
+  %strong many
+  programs
+  connected. A nightmare. It even breaks that old turing idea and so it is very difficult to reason about what goes on in a
+  multi threaded program, and the only ways this is achieved is by developing a more restrictive model.
+%p
+  In essence the thread memory model is broken. Ideally i would not like to implement it, or if implemented, at least fix it
+  first.
+%p But what is the fix? It is in essence what the process model was, ie each thread has it’s own memory.
+%h3#thread-memory Thread memory
+%p
+  In OO it is possible to fix the thread model, just because we have no global memory access. In effect the memory model
+  must be inverted: instead of almost all memory being shared by all threads and each thread having a small thread local
+  storage, threads must have mostly thread specific data and a small amount of shared resources.
+%p
+  A thread would thus work as a process used. In essence it can update any data it sees without restrictions. It must
+  exchange data with other threads through specified global objects, that take the role of what ipc used to be.
+%p In an oo system this can be enforced by strict pass-by-value over thread borders.
+%p
+  The itc (inter thread communication) objects are the only ones that need current thread synchronization techniques.
+  The one mechanism that could cover all needs could be a simple lists.
+%h3#rubyx RubyX
+%p
+  The original problem of what a program does during a kernel call could be solved by a very small number of kernel threads.
+  Any kernel call would be listed and “c” threads would pick them up to execute them and return the result.
+%p
+  All other threads could be managed as green threads. Threads may not share objects, other than a small number of system
+  provided.
--- a/app/views/pages/misc/typed.html.haml
+++ b/app/views/pages/misc/typed.html.haml
@ -0,0 +1,55 @@
+= render "pages/misc/menu"
+
+%h1= title "Typed intermediate representation"
+
+%p
+  Compilers use different intermediate representations to go from the source code to a binary,
+  which would otherwise be too big a step.
+%p
+  The
+  %strong typed
+  intermediate representation is a strongly typed layer, between the dynamically typed
+  ruby above, and the register machine below. One can think of it as a mix between c and c++,
+  minus the syntax aspect. While in 2015, this layer existed as a language, (see soml-parser), it
+  is now a tree representation only.
+%h2 Object oriented to the core, including calling convention
+%p
+  Types are modeled by the class Type and carry information about instance variable names
+  and their basic type.
+  %em Every object
+  stores a reference
+  to it’s type, and while
+  = succeed "," do
+    %strong types are immutable
+%p
+  The object model, ie the basic properties of objects that the system relies on, is quite simple
+  and explained in the runtime section. It involves a single reference per object.
+  Also the object memory model is kept quite simple in that object sizes are always small multiples
+  of the cache size of the hardware machine.
+  We use object encapsulation to build up larger looking objects from these basic blocks.
+%p
+  The calling convention is also object oriented, not stack based*. Message objects are used to
+  define the data needed for invocation. They carry arguments, a frame and return address.
+  The return address is pre-calculated and determined by the caller, so
+  a method invocation may thus be made to return to an entirely different location.
+  *(A stack, as used in c, is not typed, not object oriented, and as such a source of problems)
+%p
+  There is no non- object based memory at all. The only global constants are instances of
+  classes that can be accessed by writing the class name in ruby source.
+%h2 Runtime / Parfait
+%p
+  The typed representation layer depends on the higher layer to actually determine and instantiate
+  types (type objects, or objects of class Type). This includes method arguments and local variables.
+%p
+  The typed layer is mainly concerned in defining TypedMethods, for which argument or local variable
+  have specified type (like in c). Basic Type names are the class names they represent,
+  but the “int” may be used for brevity
+  instead of Integer.
+%p
+  The runtime, Parfait, is kept
+  to a minimum, currently around 15 classes, described in detail
+  = succeed "." do
+    %a{:href => "parfait.html"} here
+%p
+  Historically Parfait has been coded in ruby, as it was first needed in the compiler.
+  This had the additional benefit of providing solid test cases for the functionality.