add a object storage post

2014-08-19 15:17:12 +03:00
parent 9a88476c67
commit b2fece1926
1 changed files with 94 additions and 0 deletions
@@ -0,0 +1,94 @@
+---
+layout: news
+author: Torsten
+---
+
+While trying to figure out what i am coding i had to attac this storage format before i wanted to. The 
+immediate need is for code dumps, that are concise but readable. I started with yaml but that just takes 
+too many lines, so it's too difficult to see what is going on.
+
+I just finished it, it's a sort of condensed yaml i call sof (salama object file), but i want to take the
+moment to reflect why i did this, what the bigger picture is, where sof may go.
+
+### Program lifecycle
+
+Let's take a step back to mother smalltalk: there was the image. The image was/is the state of all the
+ objects in the system. Even threads, everything. Absolute object thinking taken to the ultimate. 
+ A great idea off course, but doomed to ultimately fail because no man is an island (so no vm is either).
+
+#### Development
+
+Software development is a team sport, a social activity at it's core. This is not always realised, 
+when the focus is too much on the outcome, but when you look at it, everything is done in teams.
+
+The other thing not really taken into account in the standard developemnt model is that it is a process in 
+time that really only gets jucy with a first customer released version. Then you get into branches for bugs
+and features, versions with major and minor and before long you'r in a jungle of code.
+
+#### Code centered
+ 
+But all that effort is concentrated on code. Ok nowadays schema evlolution is part of the game, so the
+existance of data is acknowledged, but only as an external thing. Nowhere near that smalltalk model.
+
+But off course a truely object oriented program is not just code. It's data too. Maybe currently "just"
+configuration and enums/constants and locales, but that is exactly my point.
+
+The lack of defined data/object storage is holding us back, making all our programs fruit-flies. 
+I mean it lives a short time and dies. A program has no way of "learning", of accumulating data/knowledge
+to use in a next invocation.
+
+#### Optimisation example
+
+Let's take optimisation as an example. So a developer runs tests (rubyprof/valgrind or something) 
+with some output and makes program changes accordingly. But there are two obvious problems.
+Firstly the data is collected in development not production. Secondly, and more importantly, a person is
+needed.
+
+Of course a program could quite easily monitor itself, possibly over a long time, possibly only when
+not at epak load. And surely some optimisations could be automated, a bit like the O1 .. On compiler
+switches, more and more effort could be exerted on critical regions. Possibly all the way to 
+super-optimisation.
+
+But even if we did this, and a program would improve/jit itself, the fruits of this work are only usable
+during that run of tah program. Future invocations, just like future versions of that program do not
+benefit. And thus start again, just like in Groundhog day.
+
+### Storage
+
+So to make that optimisation example work, we would need a storage: Theoretically we could make the program
+change it's own executable/object files, in ruby even it's source. Theoretically, as we have no 
+representation of the code to work on.
+
+In salama we do have an internal representation, both at the code level (ast) and the compiled code 
+(MethodDefinition, Intructions and friends).
+
+#### Storage Format
+
+Going back to the Image we can ask why was it doomed to fail: because of the binary, 
+propriotary implementation. Not because of the idea as such.
+
+Binary data needs either a rigourous specification and/or software to work on it. Work, what work?
+We need to merge the data between installations, maintain versions and branches. That sounds a lot like
+version control, because it basically is. Off course this "could" have been solved by the smalltalk
+people, but wasn't. I think it's fair to say that git was the first system to solve that problem.
+
+And git off course works with diff, and so for a 3-way merge to be successful we need a text format. 
+Which is why i started with yaml, and which is why also sof is text-based.
+
+The other benefit is off course human readbility.
+
+So now we have an object file * format in text, and we have git. What we do with it is up to us.
+(* well, i only finished the writer. reading/parsing is "left as an excercise for the reader":-)
+
+#### Sof as object file format
+
+Ok, i'll scetch it a little: Salama would use sof as it's object file format, and only sof would ever be
+stored in git. For developers to work, tools would create source and when that is edited compile it to sof.
+
+A program would be a repository of sof and resource files. Some convention for load order would be helpful
+and some "area" where programs may collect data or changes to the program. Some may off course alter the 
+sof's directly.
+
+How, when and how automatically changes are merged (via git) is up to developer policy . But it is 
+easily imaginable that data in program designated areas get merged back into the "mainstream" automatically.
+