ruby-x.github.io/app/views/pages/misc/soml_benchmarks.html.haml

= render "pages/misc/menu"

Disclaimer: read the intro of
=link_to "Soml" , "soml.html"
first. This is obsolete, and has only historic value.


%h1= title "Simple soml performance numbers"
%p
  These benchmarks were made to establish places for optimizations. This early on it is clear that
  performance is not outstanding, but still there were some surprises.
%ul
  %li loop  - program does empty loop of same size as hello
  %li hello - output hello world (to dev/null) to measure kernel calls (not terminal speed)
  %li itos  - convert integers from 1 to 100000  to string
  %li add   - run integer adds by linear fibonacci of 40
  %li call  - exercise calling by recursive fibonacci of 20
%p
  Hello and itos and add run 100_000 iterations per program invocation to remove startup overhead.
  Call only has 10000 iterations, as it is much slower, executing about 10000 calls per invocation
%p Gcc used to compile c on the machine. soml executables produced by ruby (on another machine)

%h2 Results
%p
  Results were measured by a ruby script. Mean and variance was measured until variance was low,
  always under one percent.
%p
  The machine was a virtual arm run on a powerbook, performance roughly equivalent to a raspberry pi.
  But results should be seen as relative, not absolute (some were scaled)
%p
  = image_tag "typed/bench.png" ,alt: "Graph"

%h2 Discussion
%p
  Surprisingly there are areas where soml code runs faster than c. Especially in the hello example this
  may not mean too much. Printf does caching and has a lot functionality, so it may not be a straight
  comparison. The loop example is surprising and needs to be examined.
%p
  The add example is slower because of the different memory model and lack of optimisation for soml.
  Every result of an arithmetic operation is immediately written to memory in soml, whereas c will
  keep things in registers as long as it can, which in the example is the whole time. This can
  be improved upon with register code optimisation, which can cut loads after writes and writes that
  that are overwritten before calls or jumps are made.
%p
  The call was expected to be larger as a typed model is used and runtime information (like the method
  name) made available. It is actually a small price to pay for the ability to generate code at runtime
  and will off course reduce drastically with inlining.
%p
  The itos example was also to be expected as it relies both on calling and on arithmetic. Also itos
  relies heavily on division by 10, which when coded in cpu specific assembler may easily be sped up
  by a factor of 2-3.
%p
  All in all the results are encouraging as no optimization efforts have been made. Off course the
  most encouraging fact is that the system works and thus may be used as the basis of a dynamic
  code generator, as opposed to having to interpret.
group the old and new ideas into misc typed (file) still needs work, but at least the typed directory is gone 2018-04-11 19:53:49 +02:00			`= render "pages/misc/menu"`

			`Disclaimer: read the intro of`
			`=link_to "Soml" , "soml.html"`
			`first. This is obsolete, and has only historic value.`
clean up typed pages 2018-04-11 18:43:29 +02:00

group the old and new ideas into misc typed (file) still needs work, but at least the typed directory is gone 2018-04-11 19:53:49 +02:00			`%h1= title "Simple soml performance numbers"`
move md to haml 2018-04-10 18:50:07 +02:00			`%p`
			`These benchmarks were made to establish places for optimizations. This early on it is clear that`
			`performance is not outstanding, but still there were some surprises.`
			`%ul`
			`%li loop - program does empty loop of same size as hello`
			`%li hello - output hello world (to dev/null) to measure kernel calls (not terminal speed)`
			`%li itos - convert integers from 1 to 100000 to string`
			`%li add - run integer adds by linear fibonacci of 40`
			`%li call - exercise calling by recursive fibonacci of 20`
			`%p`
			`Hello and itos and add run 100_000 iterations per program invocation to remove startup overhead.`
			`Call only has 10000 iterations, as it is much slower, executing about 10000 calls per invocation`
			`%p Gcc used to compile c on the machine. soml executables produced by ruby (on another machine)`
rename typed to soml 2018-04-11 18:58:57 +02:00
			`%h2 Results`
move md to haml 2018-04-10 18:50:07 +02:00			`%p`
			`Results were measured by a ruby script. Mean and variance was measured until variance was low,`
			`always under one percent.`
			`%p`
			`The machine was a virtual arm run on a powerbook, performance roughly equivalent to a raspberry pi.`
			`But results should be seen as relative, not absolute (some were scaled)`
			`%p`
rename typed to soml 2018-04-11 18:58:57 +02:00			`= image_tag "typed/bench.png" ,alt: "Graph"`

			`%h2 Discussion`
move md to haml 2018-04-10 18:50:07 +02:00			`%p`
			`Surprisingly there are areas where soml code runs faster than c. Especially in the hello example this`
			`may not mean too much. Printf does caching and has a lot functionality, so it may not be a straight`
			`comparison. The loop example is surprising and needs to be examined.`
			`%p`
			`The add example is slower because of the different memory model and lack of optimisation for soml.`
			`Every result of an arithmetic operation is immediately written to memory in soml, whereas c will`
			`keep things in registers as long as it can, which in the example is the whole time. This can`
			`be improved upon with register code optimisation, which can cut loads after writes and writes that`
			`that are overwritten before calls or jumps are made.`
			`%p`
			`The call was expected to be larger as a typed model is used and runtime information (like the method`
			`name) made available. It is actually a small price to pay for the ability to generate code at runtime`
			`and will off course reduce drastically with inlining.`
			`%p`
			`The itos example was also to be expected as it relies both on calling and on arithmetic. Also itos`
			`relies heavily on division by 10, which when coded in cpu specific assembler may easily be sped up`
			`by a factor of 2-3.`
			`%p`
			`All in all the results are encouraging as no optimization efforts have been made. Off course the`
			`most encouraging fact is that the system works and thus may be used as the basis of a dynamic`
			`code generator, as opposed to having to interpret.`