82 lines
5.1 KiB
Plaintext
82 lines
5.1 KiB
Plaintext
|
%p
|
|||
|
While trying to figure out what i am coding i had to attack this storage format before i wanted to. The
|
|||
|
immediate need is for code dumps, that are concise but readable. I started with yaml but that just takes
|
|||
|
too many lines, so it’s too difficult to see what is going on.
|
|||
|
%p
|
|||
|
I just finished it, it’s a sort of condensed yaml i call sof (salama object file), but i want to take the
|
|||
|
moment to reflect why i did this, what the bigger picture is, where sof may go.
|
|||
|
%h3#program-lifecycle Program lifecycle
|
|||
|
%p
|
|||
|
Let’s take a step back to mother smalltalk: there was the image. The image was/is the state of all the
|
|||
|
objects in the system. Even threads, everything. Absolute object thinking taken to the ultimate.
|
|||
|
A great idea off course, but doomed to ultimately fail because no man is an island (so no vm is either).
|
|||
|
%h4#development Development
|
|||
|
%p
|
|||
|
Software development is a team sport, a social activity at it’s core. This is not always realised,
|
|||
|
when the focus is too much on the outcome, but when you look at it, everything is done in teams.
|
|||
|
%p
|
|||
|
The other thing not really taken into account in the standard developemnt model is that it is a process in
|
|||
|
time that really only gets jucy with a first customer released version. Then you get into branches for bugs
|
|||
|
and features, versions with major and minor and before long you’r in a jungle of code.
|
|||
|
%h4#code-centered Code centered
|
|||
|
%p
|
|||
|
But all that effort is concentrated on code. Ok nowadays schema evlolution is part of the game, so the
|
|||
|
existance of data is acknowledged, but only as an external thing. Nowhere near that smalltalk model.
|
|||
|
%p
|
|||
|
But off course a truely object oriented program is not just code. It’s data too. Maybe currently “just”
|
|||
|
configuration and enums/constants and locales, but that is exactly my point.
|
|||
|
%p
|
|||
|
The lack of defined data/object storage is holding us back, making all our programs fruit-flies.
|
|||
|
I mean it lives a short time and dies. A program has no way of “learning”, of accumulating data/knowledge
|
|||
|
to use in a next invocation.
|
|||
|
%h4#optimisation-example Optimisation example
|
|||
|
%p
|
|||
|
Let’s take optimisation as an example. So a developer runs tests (rubyprof/valgrind or something)
|
|||
|
with some output and makes program changes accordingly. But there are two obvious problems.
|
|||
|
Firstly the data is collected in development not production. Secondly, and more importantly, a person is
|
|||
|
needed.
|
|||
|
%p
|
|||
|
Of course a program could quite easily monitor itself, possibly over a long time, possibly only when
|
|||
|
not at epak load. And surely some optimisations could be automated, a bit like the O1 .. On compiler
|
|||
|
switches, more and more effort could be exerted on critical regions. Possibly all the way to
|
|||
|
super-optimisation.
|
|||
|
%p
|
|||
|
But even if we did this, and a program would improve/jit itself, the fruits of this work are only usable
|
|||
|
during that run of that program. Future invocations, just like future versions of that program do not
|
|||
|
benefit. And thus start again, just like in Groundhog day.
|
|||
|
%h3#storage Storage
|
|||
|
%p
|
|||
|
So to make that optimisation example work, we would need a storage: Theoretically we could make the program
|
|||
|
change it’s own executable/object files, in ruby even it’s source. Theoretically, as we have no
|
|||
|
representation of the code to work on.
|
|||
|
%p
|
|||
|
In salama we do have an internal representation, both at the code level (ast) and the compiled code
|
|||
|
(CompiledMethod, Intructions and friends).
|
|||
|
%h4#storage-format Storage Format
|
|||
|
%p
|
|||
|
Going back to the Image we can ask why was it doomed to fail: because of the binary,
|
|||
|
proprietary implementation. Not because of the idea as such.
|
|||
|
%p
|
|||
|
Binary data needs either a rigourous specification and/or software to work on it. Work, what work?
|
|||
|
We need to merge the data between installations, maintain versions and branches. That sounds a lot like
|
|||
|
version control, because it basically is. Off course this “could” have been solved by the smalltalk
|
|||
|
people, but wasn’t. I think it’s fair to say that git was the first system to solve that problem.
|
|||
|
%p
|
|||
|
And git off course works with diff, and so for a 3-way merge to be successful we need a text format.
|
|||
|
Which is why i started with yaml, and which is why also sof is text-based.
|
|||
|
%p The other benefit is off course human readability.
|
|||
|
%p
|
|||
|
So now we have an object file * format in text, and we have git. What we do with it is up to us.
|
|||
|
(* well, i only finished the writer. reading/parsing is “left as an excercise for the reader”:-)
|
|||
|
%h4#sof-as-object-file-format Sof as object file format
|
|||
|
%p
|
|||
|
Ok, i’ll sketch it a little: Salama would use sof as it’s object file format, and only sof would ever be
|
|||
|
stored in git. For developers to work, tools would create source and when that is edited compile it to sof.
|
|||
|
%p
|
|||
|
A program would be a repository of sof and resource files. Some convention for load order would be helpful
|
|||
|
and some “area” where programs may collect data or changes to the program. Some may off course alter the
|
|||
|
sof’s directly.
|
|||
|
%p
|
|||
|
How, when and how automatically changes are merged (via git) is up to developer policy . But it is
|
|||
|
easily imaginable that data in program designated areas get merged back into the “mainstream” automatically.
|