52 lines
2.7 KiB
Plaintext
52 lines
2.7 KiB
Plaintext
= render "pages/soml/menu"
|
|
|
|
%h1= title "Simple soml performance numbers"
|
|
|
|
%p
|
|
These benchmarks were made to establish places for optimizations. This early on it is clear that
|
|
performance is not outstanding, but still there were some surprises.
|
|
%ul
|
|
%li loop - program does empty loop of same size as hello
|
|
%li hello - output hello world (to dev/null) to measure kernel calls (not terminal speed)
|
|
%li itos - convert integers from 1 to 100000 to string
|
|
%li add - run integer adds by linear fibonacci of 40
|
|
%li call - exercise calling by recursive fibonacci of 20
|
|
%p
|
|
Hello and itos and add run 100_000 iterations per program invocation to remove startup overhead.
|
|
Call only has 10000 iterations, as it is much slower, executing about 10000 calls per invocation
|
|
%p Gcc used to compile c on the machine. soml executables produced by ruby (on another machine)
|
|
|
|
%h2 Results
|
|
%p
|
|
Results were measured by a ruby script. Mean and variance was measured until variance was low,
|
|
always under one percent.
|
|
%p
|
|
The machine was a virtual arm run on a powerbook, performance roughly equivalent to a raspberry pi.
|
|
But results should be seen as relative, not absolute (some were scaled)
|
|
%p
|
|
= image_tag "typed/bench.png" ,alt: "Graph"
|
|
|
|
%h2 Discussion
|
|
%p
|
|
Surprisingly there are areas where soml code runs faster than c. Especially in the hello example this
|
|
may not mean too much. Printf does caching and has a lot functionality, so it may not be a straight
|
|
comparison. The loop example is surprising and needs to be examined.
|
|
%p
|
|
The add example is slower because of the different memory model and lack of optimisation for soml.
|
|
Every result of an arithmetic operation is immediately written to memory in soml, whereas c will
|
|
keep things in registers as long as it can, which in the example is the whole time. This can
|
|
be improved upon with register code optimisation, which can cut loads after writes and writes that
|
|
that are overwritten before calls or jumps are made.
|
|
%p
|
|
The call was expected to be larger as a typed model is used and runtime information (like the method
|
|
name) made available. It is actually a small price to pay for the ability to generate code at runtime
|
|
and will off course reduce drastically with inlining.
|
|
%p
|
|
The itos example was also to be expected as it relies both on calling and on arithmetic. Also itos
|
|
relies heavily on division by 10, which when coded in cpu specific assembler may easily be sped up
|
|
by a factor of 2-3.
|
|
%p
|
|
All in all the results are encouraging as no optimization efforts have been made. Off course the
|
|
most encouraging fact is that the system works and thus may be used as the basis of a dynamic
|
|
code generator, as opposed to having to interpret.
|