ruby-x.github.io/soml/benchmarks.md

55 lines
2.5 KiB
Markdown
Raw Normal View History

---
layout: soml
title: Simple soml performance numbers
---
These benchmarks were made to establish places for optimizations. This early on it is clear that
performance is not outstanding, but still there were some surprises.
- loop - program does empty loop of same size as hello
- hello - output hello world (to dev/null) to measure kernel calls (not terminal speed)
- itos - convert integers from 1 to 100000 to string
- add - run integer adds by linear fibonacci of 40
- call - exercise calling by recursive fibonacci of 20
2015-11-27 21:20:21 +01:00
Hello and itos and add run 100_000 iterations per program invocation to remove startup overhead.
Call only has 10000 iterations, as it is much slower, executing about 10000 calls per invocation
Gcc used to compile c on the machine. soml executables produced by ruby (on another machine)
### Results
Results were measured by a ruby script. Mean and variance was measured until variance was low,
always under one percent.
The machine was a virtual arm run on a powerbook, performance roughly equivalent to a raspberry pi.
2015-11-27 21:20:21 +01:00
But results should be seen as relative, not absolute (some were scaled)
2015-11-27 21:20:21 +01:00
![Graph](bench.png)
### Discussion
Surprisingly there are areas where soml code runs faster than c. Especially in the hello example this
may not mean too much. Printf does caching and has a lot functionality, so it may not be a straight
comparison. The loop example is surprising and needs to be examined.
The add example is slower because of the different memory model and lack of optimisation for soml.
Every result of an arithmetic operation is immediately written to memory in soml, whereas c will
keep things in registers as long as it can, which in the example is the whole time. This can
be improved upon with register code optimisation, which can cut loads after writes and writes that
that are overwritten before calls or jumps are made.
The call was expected to be larger as a typed model is used and runtime information (like the method
name) made available. It is actually a small price to pay for the ability to generate code at runtime
and will off course reduce drastically with inlining.
The itos example was also to be expected as it relies both on calling and on arithmetic. Also itos
relies heavily on division by 10, which when coded in cpu specific assembler may easily be sped up
by a factor of 2-3.
All in all the results are encouraging as no optimization efforts have been made. Off course the
most encouraging fact is that the system works and thus may be used as the basis of a dynamic
code generator, as opposed to having to interpret.