move md to haml
This commit is contained in:
111
rubyx/layers.html.haml
Normal file
111
rubyx/layers.html.haml
Normal file
@ -0,0 +1,111 @@
|
||||
%hr/
|
||||
%p
|
||||
layout: rubyx
|
||||
title: RubyX architectural layers
|
||||
—
|
||||
%h2#main-layers Main Layers
|
||||
%p
|
||||
To implement an object system to execute object oriented languages takes a large system.
|
||||
The parts or abstraction layers are detailed below.
|
||||
%p
|
||||
It is important to understand the approach first though, as it differs from the normal
|
||||
interpretation. The idea is to
|
||||
%strong compile
|
||||
ruby. The argument is often made that
|
||||
typed languages are faster, but i don’t believe in that. I think dynamic languages
|
||||
just push more functionality into the “virtual machine” and it is in fact only the
|
||||
compiling to binaries that gives static languages their speed. This is the reason
|
||||
to compile ruby.
|
||||
%p
|
||||
%img{:alt => "Architectural layers", :src => "/assets/layers.jpg"}/
|
||||
%h3#ruby Ruby
|
||||
%p
|
||||
To compile and run ruby, we first need to parse ruby. While parsing ruby is quite
|
||||
a difficult task, it has already been implemented in pure ruby
|
||||
%a{:href => "https://github.com/whitequark/parser"}> here
|
||||
\. The output of the parser is
|
||||
an ast, which holds information about the code in instances of a single
|
||||
%em Node
|
||||
class.
|
||||
Nodes have a type (which you sometimes see in s-expressions) and a list of children.
|
||||
%p There are two basic problems when working with ruby ast: one is the a in ast, the other is ruby.
|
||||
%p
|
||||
Since an abstract syntax tree only has one base class, one needs to employ the visitor
|
||||
pattern to write a compiler. This ends up being one great class with lots of unrelated
|
||||
functions, removing much of the benefit of OO.
|
||||
%p
|
||||
The second, possibly bigger problem, is ruby itself: Ruby is full of programmer happiness,
|
||||
three ways to do this, five to do that. To simplify that, remove the duplication and
|
||||
make analyis easier, Vool was created.
|
||||
%h3#virtual-object-oriented-language Virtual Object Oriented Language
|
||||
%p
|
||||
Virtual, in this context, means that there is no syntax for this language; it is an
|
||||
intermediate representation which
|
||||
%em could
|
||||
be targeted by several languages.
|
||||
%p
|
||||
The main purpose is to simplify existing oo languages down to it’s core components: mostly
|
||||
calling, assignment, continuations and exceptions. Typed classes for each language construct
|
||||
exist and make it easier to transform a statement into a lower level representations.
|
||||
%p
|
||||
Examples for things that exist in ruby but are broken down in Vool are
|
||||
%em unless
|
||||
, ternary operator,
|
||||
do while or for loops and other similar syntactic sugar.
|
||||
%h3#minimal-object-machine Minimal Object machine
|
||||
%p
|
||||
We compile Vool statements into Mom instructions. Mom is a machine, which means it has
|
||||
instructions. But unlike a cpu (or the risc layer below) it does not have memory, only objects.
|
||||
It also has no registers, and together these two things mean that all information is stored in
|
||||
objects. Also the calling convention is object based and uses Frame and Message instances to
|
||||
save state.
|
||||
%p
|
||||
Objects are typed, and are in fact the same objects the language operates on. Just the
|
||||
functionality is expressed through instructions. Methods are in fact defined (as vool) on classes
|
||||
and then compiled to Mom/Risc/Arm and the results stored in the method object.
|
||||
%p
|
||||
Compilation to Mom happens in two stages:
|
||||
1. The linear statements/code is translated to Mom instructions.
|
||||
2. Control statements are translated to jumps and labels.
|
||||
%p
|
||||
The second step leaves a linked list of machine instructions as the input for the next stage.
|
||||
In the future a more elaborate system of optimisations is envisioned between these stages.
|
||||
%h3#risc Risc
|
||||
%p
|
||||
The Register machine layer is a relatively close abstraction of risc hardware, but without the
|
||||
quirks.
|
||||
%p
|
||||
The Risc machine has registers, indexed addressing, operators, branches and everything
|
||||
needed for the next layer. It does not try to abstract every possible machine feature
|
||||
(like llvm), but rather “objectifies” the general risc view to provide what is needed for
|
||||
the Mom layer, the next layer up.
|
||||
%p
|
||||
The machine has it’s own (abstract) instruction set, and the mapping to arm is quite
|
||||
straightforward. Since the instruction set is implemented as derived classes, additional
|
||||
instructions may be defined and used later, as long as translation is provided for them too.
|
||||
In other words the instruction set is extensible (unlike cpu instruction sets).
|
||||
%p
|
||||
Basic object oriented concepts are needed already at this level, to be able to generate a whole
|
||||
self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called
|
||||
parfait, and the same objects will be used at runtime and compile time.
|
||||
%p
|
||||
Since working with at this low machine level (essentially assembler) is not easy to follow for
|
||||
everyone (me :-), an interpreter was created (by me:-). Later a graphical interface, a kind of
|
||||
%a{:href => "https://github.com/ruby-x/rubyx-debugger"} visual debugger
|
||||
was added.
|
||||
Visualizing the control flow and being able to see values updated immediately helped
|
||||
tremendously in creating this layer. And the interpreter helps in testing, ie keeping it
|
||||
working in the face of developer change.
|
||||
%h3#binary--arm-and-elf Binary , Arm and Elf
|
||||
%p
|
||||
A physical machine will run binaries containing instructions that the cpu understands, in a
|
||||
format the operating system understands (elf). Arm and elf subdirectories hold the code for
|
||||
these layers.
|
||||
%p
|
||||
Arm is a risc architecture, but anyone who knows it will attest, with it’s own quirks.
|
||||
For example any instruction may be executed conditionally in arm. Or there is no 32bit
|
||||
register load instruction. It is possible to create very dense code using all the arm
|
||||
special features, but this is not implemented yet.
|
||||
%p
|
||||
All Arm instructions are (ie derive from) Register instruction and there is an ArmTranslator
|
||||
that translates RegisterInstructions to ArmInstructions.
|
109
rubyx/layers.md
109
rubyx/layers.md
@ -1,109 +0,0 @@
|
||||
---
|
||||
layout: rubyx
|
||||
title: RubyX architectural layers
|
||||
---
|
||||
|
||||
## Main Layers
|
||||
|
||||
To implement an object system to execute object oriented languages takes a large system.
|
||||
The parts or abstraction layers are detailed below.
|
||||
|
||||
It is important to understand the approach first though, as it differs from the normal
|
||||
interpretation. The idea is to **compile** ruby. The argument is often made that
|
||||
typed languages are faster, but i don't believe in that. I think dynamic languages
|
||||
just push more functionality into the "virtual machine" and it is in fact only the
|
||||
compiling to binaries that gives static languages their speed. This is the reason
|
||||
to compile ruby.
|
||||
|
||||

|
||||
|
||||
|
||||
### Ruby
|
||||
|
||||
To compile and run ruby, we first need to parse ruby. While parsing ruby is quite
|
||||
a difficult task, it has already been implemented in pure ruby
|
||||
[here](https://github.com/whitequark/parser). The output of the parser is
|
||||
an ast, which holds information about the code in instances of a single *Node* class.
|
||||
Nodes have a type (which you sometimes see in s-expressions) and a list of children.
|
||||
|
||||
There are two basic problems when working with ruby ast: one is the a in ast, the other is ruby.
|
||||
|
||||
Since an abstract syntax tree only has one base class, one needs to employ the visitor
|
||||
pattern to write a compiler. This ends up being one great class with lots of unrelated
|
||||
functions, removing much of the benefit of OO.
|
||||
|
||||
The second, possibly bigger problem, is ruby itself: Ruby is full of programmer happiness,
|
||||
three ways to do this, five to do that. To simplify that, remove the duplication and
|
||||
make analyis easier, Vool was created.
|
||||
|
||||
### Virtual Object Oriented Language
|
||||
|
||||
Virtual, in this context, means that there is no syntax for this language; it is an
|
||||
intermediate representation which *could* be targeted by several languages.
|
||||
|
||||
The main purpose is to simplify existing oo languages down to it's core components: mostly
|
||||
calling, assignment, continuations and exceptions. Typed classes for each language construct
|
||||
exist and make it easier to transform a statement into a lower level representations.
|
||||
|
||||
Examples for things that exist in ruby but are broken down in Vool are *unless* , ternary operator,
|
||||
do while or for loops and other similar syntactic sugar.
|
||||
|
||||
### Minimal Object machine
|
||||
|
||||
We compile Vool statements into Mom instructions. Mom is a machine, which means it has
|
||||
instructions. But unlike a cpu (or the risc layer below) it does not have memory, only objects.
|
||||
It also has no registers, and together these two things mean that all information is stored in
|
||||
objects. Also the calling convention is object based and uses Frame and Message instances to
|
||||
save state.
|
||||
|
||||
Objects are typed, and are in fact the same objects the language operates on. Just the
|
||||
functionality is expressed through instructions. Methods are in fact defined (as vool) on classes
|
||||
and then compiled to Mom/Risc/Arm and the results stored in the method object.
|
||||
|
||||
Compilation to Mom happens in two stages:
|
||||
1. The linear statements/code is translated to Mom instructions.
|
||||
2. Control statements are translated to jumps and labels.
|
||||
|
||||
The second step leaves a linked list of machine instructions as the input for the next stage.
|
||||
In the future a more elaborate system of optimisations is envisioned between these stages.
|
||||
|
||||
### Risc
|
||||
|
||||
The Register machine layer is a relatively close abstraction of risc hardware, but without the
|
||||
quirks.
|
||||
|
||||
The Risc machine has registers, indexed addressing, operators, branches and everything
|
||||
needed for the next layer. It does not try to abstract every possible machine feature
|
||||
(like llvm), but rather "objectifies" the general risc view to provide what is needed for
|
||||
the Mom layer, the next layer up.
|
||||
|
||||
The machine has it's own (abstract) instruction set, and the mapping to arm is quite
|
||||
straightforward. Since the instruction set is implemented as derived classes, additional
|
||||
instructions may be defined and used later, as long as translation is provided for them too.
|
||||
In other words the instruction set is extensible (unlike cpu instruction sets).
|
||||
|
||||
Basic object oriented concepts are needed already at this level, to be able to generate a whole
|
||||
self contained system. Ie what an object is, a class, a method etc. This minimal runtime is called
|
||||
parfait, and the same objects will be used at runtime and compile time.
|
||||
|
||||
Since working with at this low machine level (essentially assembler) is not easy to follow for
|
||||
everyone (me :-), an interpreter was created (by me:-). Later a graphical interface, a kind of
|
||||
[visual debugger](https://github.com/ruby-x/rubyx-debugger) was added.
|
||||
Visualizing the control flow and being able to see values updated immediately helped
|
||||
tremendously in creating this layer. And the interpreter helps in testing, ie keeping it
|
||||
working in the face of developer change.
|
||||
|
||||
|
||||
### Binary , Arm and Elf
|
||||
|
||||
A physical machine will run binaries containing instructions that the cpu understands, in a
|
||||
format the operating system understands (elf). Arm and elf subdirectories hold the code for
|
||||
these layers.
|
||||
|
||||
Arm is a risc architecture, but anyone who knows it will attest, with it's own quirks.
|
||||
For example any instruction may be executed conditionally in arm. Or there is no 32bit
|
||||
register load instruction. It is possible to create very dense code using all the arm
|
||||
special features, but this is not implemented yet.
|
||||
|
||||
All Arm instructions are (ie derive from) Register instruction and there is an ArmTranslator
|
||||
that translates RegisterInstructions to ArmInstructions.
|
45
rubyx/memory.html.haml
Normal file
45
rubyx/memory.html.haml
Normal file
@ -0,0 +1,45 @@
|
||||
%hr/
|
||||
%p
|
||||
layout: rubyx
|
||||
title: Types, memory layout and management
|
||||
—
|
||||
%p Memory management must be one of the main horrors of computing. That’s why garbage collected languages like ruby are so great. Even simple malloc implementations tend to be quite complicated. Unnecessary so, if one used object oriented principles of data hiding.
|
||||
%h3#object-and-values Object and values
|
||||
%p As has been mentioned, in a true OO system, object tagging is not really an option. Tagging being the technique of adding the lowest bit as marker to pointers and thus having to shift ints and loosing a bit. Mri does this for Integers but not other value types. We accept this and work with it and just say “off course” , but it’s not modeled well.
|
||||
%p Integers are not Objects like “normal” objects. They are Values, on par with ObjectReferences, and have the following distinctive differences:
|
||||
%ul
|
||||
%li equality implies identity
|
||||
%li constant for whole lifetime
|
||||
%li pass by value semantics
|
||||
%p If integers were normal objects, the first would mean they would be singletons. The second means you can’t change them, you can only change a variable to hold a different value. It also means you can’t add instance variables to an integer, neither singleton_methods. And the third means that if you do change the variable, a passed value will not be changed. Also they are not garbage collected. If you noticed how weird that idea is (the gc), you can see how natural is that Value idea.
|
||||
%p Instead of trying to make this difference go away (like MRI) I think it should be explicit and indeed be expanded to all Objects that have these properties. Words for examples (ruby calls them Symbols), are the same. A Table is a Table, and Toble is not. Floats (all numbers) and Times are the same.
|
||||
%h3#object-type Object Type
|
||||
%p So if we’re not tagging we must pass and keep the type information around separately. For passing it has been mentioned that a separate register is used.
|
||||
%p For keeping track of the type data we need to make a decision of how many we support. The register for passing gives the upper limit of 4 bits, and this fits well with the idea of cache lines. So if we use cache lines, for every 8 words, we take one for the type.
|
||||
%p Traditionally the class of the object is stored in the object. But this forces the dynamic lookup that is a good part of the performance problem. Instead we store the Object’s Type. The Type then stores the Class, but it is the type that describes the memory layout of the object (and all objects with the same type).
|
||||
%p This is is in essence a level of indirection that gives us the space to have several Types for one class, and so we can evolve the class without having to change the Type (we just create new ones for every change)
|
||||
%p
|
||||
The memory layout of
|
||||
%strong every
|
||||
object is type word followed by “data”.
|
||||
%p That leaves the length open and we can use the 8th 4bits to store it. That gives a maximum of 16 Lines.
|
||||
%h4#continuations Continuations
|
||||
%p
|
||||
But (i hear), ruby is dynamic, we must be able to add variables and methods to an object at any time.
|
||||
So the type can’t be fixed. Ok, we can change the Type every time, but when any empty slots have
|
||||
been used up, what then.
|
||||
%p
|
||||
Then we use Continuations, so instead of adding a new variable to the end of the object, we use a
|
||||
new object and store it in the original object. Thus extending the object.
|
||||
%p
|
||||
Continuations are pretty normal objects and it is just up to the object to manage the redirection.
|
||||
Off course this may splatter objects a little, but in running application this does not really happen much. Most instance variables are added quite soon after startup, just as functions are usually parsed in the beginning.
|
||||
%p The good side of continuation is also that we can be quite tight on initial allocation, and even minimal with continuations. Continuations can be completely changed out after all.
|
||||
%h3#pages-and-spaces Pages and Spaces
|
||||
%p
|
||||
Now we have the smallest units taken care of, we need to store them and allocate and manage larger chunks. This is much
|
||||
simpler and we can use a fixed size Page, as say 256 lines.
|
||||
%p The highest order is a Space, which is just a list of Pages. Spaces manage Pages in a very simliar way that Pages manage Objects, ie ie as liked lists of free Objects/Pages.
|
||||
%p
|
||||
A Page, like a Space, is off course a normal object. The actual memory materialises out of nowhere, but then gets
|
||||
filled immediately with objects. So no empty memory is managed, just objects that can be repurposed.
|
@ -1,58 +0,0 @@
|
||||
---
|
||||
layout: rubyx
|
||||
title: Types, memory layout and management
|
||||
---
|
||||
|
||||
Memory management must be one of the main horrors of computing. That's why garbage collected languages like ruby are so great. Even simple malloc implementations tend to be quite complicated. Unnecessary so, if one used object oriented principles of data hiding.
|
||||
|
||||
### Object and values
|
||||
|
||||
As has been mentioned, in a true OO system, object tagging is not really an option. Tagging being the technique of adding the lowest bit as marker to pointers and thus having to shift ints and loosing a bit. Mri does this for Integers but not other value types. We accept this and work with it and just say "off course" , but it's not modeled well.
|
||||
|
||||
Integers are not Objects like "normal" objects. They are Values, on par with ObjectReferences, and have the following distinctive differences:
|
||||
|
||||
- equality implies identity
|
||||
- constant for whole lifetime
|
||||
- pass by value semantics
|
||||
|
||||
If integers were normal objects, the first would mean they would be singletons. The second means you can't change them, you can only change a variable to hold a different value. It also means you can't add instance variables to an integer, neither singleton_methods. And the third means that if you do change the variable, a passed value will not be changed. Also they are not garbage collected. If you noticed how weird that idea is (the gc), you can see how natural is that Value idea.
|
||||
|
||||
Instead of trying to make this difference go away (like MRI) I think it should be explicit and indeed be expanded to all Objects that have these properties. Words for examples (ruby calls them Symbols), are the same. A Table is a Table, and Toble is not. Floats (all numbers) and Times are the same.
|
||||
|
||||
### Object Type
|
||||
|
||||
So if we're not tagging we must pass and keep the type information around separately. For passing it has been mentioned that a separate register is used.
|
||||
|
||||
For keeping track of the type data we need to make a decision of how many we support. The register for passing gives the upper limit of 4 bits, and this fits well with the idea of cache lines. So if we use cache lines, for every 8 words, we take one for the type.
|
||||
|
||||
Traditionally the class of the object is stored in the object. But this forces the dynamic lookup that is a good part of the performance problem. Instead we store the Object's Type. The Type then stores the Class, but it is the type that describes the memory layout of the object (and all objects with the same type).
|
||||
|
||||
This is is in essence a level of indirection that gives us the space to have several Types for one class, and so we can evolve the class without having to change the Type (we just create new ones for every change)
|
||||
|
||||
The memory layout of **every** object is type word followed by "data".
|
||||
|
||||
That leaves the length open and we can use the 8th 4bits to store it. That gives a maximum of 16 Lines.
|
||||
|
||||
#### Continuations
|
||||
|
||||
But (i hear), ruby is dynamic, we must be able to add variables and methods to an object at any time.
|
||||
So the type can't be fixed. Ok, we can change the Type every time, but when any empty slots have
|
||||
been used up, what then.
|
||||
|
||||
Then we use Continuations, so instead of adding a new variable to the end of the object, we use a
|
||||
new object and store it in the original object. Thus extending the object.
|
||||
|
||||
Continuations are pretty normal objects and it is just up to the object to manage the redirection.
|
||||
Off course this may splatter objects a little, but in running application this does not really happen much. Most instance variables are added quite soon after startup, just as functions are usually parsed in the beginning.
|
||||
|
||||
The good side of continuation is also that we can be quite tight on initial allocation, and even minimal with continuations. Continuations can be completely changed out after all.
|
||||
|
||||
### Pages and Spaces
|
||||
|
||||
Now we have the smallest units taken care of, we need to store them and allocate and manage larger chunks. This is much
|
||||
simpler and we can use a fixed size Page, as say 256 lines.
|
||||
|
||||
The highest order is a Space, which is just a list of Pages. Spaces manage Pages in a very simliar way that Pages manage Objects, ie ie as liked lists of free Objects/Pages.
|
||||
|
||||
A Page, like a Space, is off course a normal object. The actual memory materialises out of nowhere, but then gets
|
||||
filled immediately with objects. So no empty memory is managed, just objects that can be repurposed.
|
76
rubyx/optimisations.html.haml
Normal file
76
rubyx/optimisations.html.haml
Normal file
@ -0,0 +1,76 @@
|
||||
%hr/
|
||||
%p
|
||||
layout: rubyx
|
||||
title: Optimisation ideas
|
||||
—
|
||||
%p I won’t manage to implement all of these idea in the beginning, so i just jot them down.
|
||||
%h3#avoid-dynamic-lookup Avoid dynamic lookup
|
||||
%p This off course is a broad topic, which may be seen under the topic of caching. Slightly wrongly though in my view, as avoiding them is really the aim. Especially for variables.
|
||||
%h4#i---instance-variables I - Instance Variables
|
||||
%p Ruby has dynamic instance variables, meaning you can add a new one at any time. This is as it should be.
|
||||
%p
|
||||
But this can easily lead to a dictionary/hash type of implementation. As variable “lookup” is probably
|
||||
%em the
|
||||
most
|
||||
common thing an OO system does, that leads to bad performance (unneccessarily).
|
||||
%p
|
||||
So instead we keep variables layed out c++ style, continous, array style, at the address of the object. Then we have
|
||||
to manage that in a dynamic manner. This (as i mentioned
|
||||
= succeed ")" do
|
||||
%a{:href => "memory.html"} here
|
||||
%p
|
||||
When a new variable is added, we create a
|
||||
%em new
|
||||
Type and change the Type of the object. We can do this as the Type will
|
||||
determine the Class of the object, which stays the same. The memory page mentions how this works with constant sized objects.
|
||||
%p So, Problem one fixed: instance variable access at O(1)
|
||||
%h4#ii---method-lookup II - Method lookup
|
||||
%p Off course that helps with Method access. All Methods are at the end variables on some (class) object. But as we can’t very well have the same (continuous) index for a given method name on all classes, it has to be looked up. Or does it?
|
||||
%p
|
||||
Well, yes it does, but maybe not more than once: We can conceivably store the result, except off course not in a dynamic
|
||||
structure as that would defeat the purpose.
|
||||
%p
|
||||
In fact there could be several caching strategies, possibly for different use cases, possibly determined by actual run-time
|
||||
measurements, but for now I just destribe a simeple one using Data-Blocks, Plocks.
|
||||
%p
|
||||
So at a call-site, we know the name of the function we want to call, and the object we want to call it on, and so have to
|
||||
find the actual function object, and by that the actual call address. In abstract terms we want to create a switch with
|
||||
3 cases and a default.
|
||||
%p
|
||||
So the code is something like, if first cache hit, call first cache , .. times three and if not do the dynamic lookup.
|
||||
The Plock can store those cache hits inside the code. So then we “just” need to get the cache loaded.
|
||||
%p Initializing the cached values is by normal lazy initialization. Ie we check for nil and if so we do the dynamic lookup, and store the result.
|
||||
%p
|
||||
Remember, we cache Type against function address. Since Types never change, we’re done. We could (as hinted above)
|
||||
do things with counters or robins, but that is for later.
|
||||
%p
|
||||
Alas: While Types are constant, darn the ruby, method implementations can actually change! And while it is tempting to
|
||||
just create a new Type for that too, that would mean going through existing objects and changing the Type, nischt gut.
|
||||
So we need change notifications, so when we cache, we must register a change listener and update the generated function,
|
||||
or at least nullify it.
|
||||
%h3#inlining Inlining
|
||||
%p
|
||||
Ok, this may not need too much explanation. Just work. It may be intersting to experiment how much this saves, and how much
|
||||
inlining is useful. I could imagine at some point it’s the register shuffling that determines the effort, not the
|
||||
actual call.
|
||||
%p Again the key is the update notifications when some of the inlined functions have changed.
|
||||
%p
|
||||
And it is important to code the functions so that they have a single exit point, otherwise it gets messy. Up to now this
|
||||
was quite simple, but then blocks and exceptions are undone.
|
||||
%h3#register-negotiation Register negotiation
|
||||
%p
|
||||
This is a little less baked, but it comes from the same idea as inlining. As calling functions is a lot of register
|
||||
shuffling, we could try to avoid some of that.
|
||||
%p More precisely, usually calling conventions have registers in which arguments are passed. And to call an “unknown”, ie any function, some kind of convention is neccessary.
|
||||
%p
|
||||
But on “cached” functions, where the function is know, it is possible to do something else. And since we have the source
|
||||
(ast) of the function around, we can do things previouly imposible.
|
||||
%p One such thing may be to recompile the function to acccept arguments exactly where they are in the calling function. Well, now that it’s written down. it does sound a lot like inlining, except without the inlining:-)
|
||||
%p
|
||||
An expansion if this idea would be to have a Negotiator on every function call. Meaning that the calling function would not
|
||||
do any shuffling, but instead call a Negotiator, and the Negotiator does the shuffling and calling of the function.
|
||||
This only really makes sense if the register shuffling information is encoded in the Negotiator object (and does not have
|
||||
to be passed).
|
||||
%p
|
||||
Negotiators could do some counting and do the recompiling when it seems worth it. The Negotiator would remove itself from
|
||||
the chain and connect called and new receiver directly. How much is in this i couldn’t say though.
|
@ -1,84 +0,0 @@
|
||||
---
|
||||
layout: rubyx
|
||||
title: Optimisation ideas
|
||||
---
|
||||
|
||||
I won't manage to implement all of these idea in the beginning, so i just jot them down.
|
||||
|
||||
### Avoid dynamic lookup
|
||||
|
||||
This off course is a broad topic, which may be seen under the topic of caching. Slightly wrongly though in my view, as avoiding them is really the aim. Especially for variables.
|
||||
|
||||
#### I - Instance Variables
|
||||
|
||||
Ruby has dynamic instance variables, meaning you can add a new one at any time. This is as it should be.
|
||||
|
||||
But this can easily lead to a dictionary/hash type of implementation. As variable "lookup" is probably *the* most
|
||||
common thing an OO system does, that leads to bad performance (unneccessarily).
|
||||
|
||||
So instead we keep variables layed out c++ style, continous, array style, at the address of the object. Then we have
|
||||
to manage that in a dynamic manner. This (as i mentioned [here](memory.html)) is done by the indirection of the Type. A Type is
|
||||
a dynamic structure mapping names to indexes (actually implemented as an array too, but the api is hash-like).
|
||||
|
||||
When a new variable is added, we create a *new* Type and change the Type of the object. We can do this as the Type will
|
||||
determine the Class of the object, which stays the same. The memory page mentions how this works with constant sized objects.
|
||||
|
||||
So, Problem one fixed: instance variable access at O(1)
|
||||
|
||||
#### II - Method lookup
|
||||
|
||||
Off course that helps with Method access. All Methods are at the end variables on some (class) object. But as we can't very well have the same (continuous) index for a given method name on all classes, it has to be looked up. Or does it?
|
||||
|
||||
Well, yes it does, but maybe not more than once: We can conceivably store the result, except off course not in a dynamic
|
||||
structure as that would defeat the purpose.
|
||||
|
||||
In fact there could be several caching strategies, possibly for different use cases, possibly determined by actual run-time
|
||||
measurements, but for now I just destribe a simeple one using Data-Blocks, Plocks.
|
||||
|
||||
So at a call-site, we know the name of the function we want to call, and the object we want to call it on, and so have to
|
||||
find the actual function object, and by that the actual call address. In abstract terms we want to create a switch with
|
||||
3 cases and a default.
|
||||
|
||||
So the code is something like, if first cache hit, call first cache , .. times three and if not do the dynamic lookup.
|
||||
The Plock can store those cache hits inside the code. So then we "just" need to get the cache loaded.
|
||||
|
||||
Initializing the cached values is by normal lazy initialization. Ie we check for nil and if so we do the dynamic lookup, and store the result.
|
||||
|
||||
Remember, we cache Type against function address. Since Types never change, we're done. We could (as hinted above)
|
||||
do things with counters or robins, but that is for later.
|
||||
|
||||
Alas: While Types are constant, darn the ruby, method implementations can actually change! And while it is tempting to
|
||||
just create a new Type for that too, that would mean going through existing objects and changing the Type, nischt gut.
|
||||
So we need change notifications, so when we cache, we must register a change listener and update the generated function,
|
||||
or at least nullify it.
|
||||
|
||||
### Inlining
|
||||
|
||||
Ok, this may not need too much explanation. Just work. It may be intersting to experiment how much this saves, and how much
|
||||
inlining is useful. I could imagine at some point it's the register shuffling that determines the effort, not the
|
||||
actual call.
|
||||
|
||||
Again the key is the update notifications when some of the inlined functions have changed.
|
||||
|
||||
And it is important to code the functions so that they have a single exit point, otherwise it gets messy. Up to now this
|
||||
was quite simple, but then blocks and exceptions are undone.
|
||||
|
||||
### Register negotiation
|
||||
|
||||
This is a little less baked, but it comes from the same idea as inlining. As calling functions is a lot of register
|
||||
shuffling, we could try to avoid some of that.
|
||||
|
||||
More precisely, usually calling conventions have registers in which arguments are passed. And to call an "unknown", ie any function, some kind of convention is neccessary.
|
||||
|
||||
But on "cached" functions, where the function is know, it is possible to do something else. And since we have the source
|
||||
(ast) of the function around, we can do things previouly imposible.
|
||||
|
||||
One such thing may be to recompile the function to acccept arguments exactly where they are in the calling function. Well, now that it's written down. it does sound a lot like inlining, except without the inlining:-)
|
||||
|
||||
An expansion if this idea would be to have a Negotiator on every function call. Meaning that the calling function would not
|
||||
do any shuffling, but instead call a Negotiator, and the Negotiator does the shuffling and calling of the function.
|
||||
This only really makes sense if the register shuffling information is encoded in the Negotiator object (and does not have
|
||||
to be passed).
|
||||
|
||||
Negotiators could do some counting and do the recompiling when it seems worth it. The Negotiator would remove itself from
|
||||
the chain and connect called and new receiver directly. How much is in this i couldn't say though.
|
72
rubyx/threads.html.haml
Normal file
72
rubyx/threads.html.haml
Normal file
@ -0,0 +1,72 @@
|
||||
%hr/
|
||||
%p
|
||||
layout: rubyx
|
||||
title: Threads are broken
|
||||
author: Torsten
|
||||
—
|
||||
%p
|
||||
Having just read about rubys threads, i was moved to collect my thoughts on the topic. How this will influence implementation
|
||||
i am not sure yet. But good to get it out on paper as a basis for communication.
|
||||
%h3#processes Processes
|
||||
%p
|
||||
I find it helps to consider why we have threads. Before threads, unix had only processes and ipc,
|
||||
so inter-process-communication.
|
||||
%p
|
||||
Processes were a good idea, keeping each programm save from the mistakes of others by restricting access to the processes
|
||||
own memory. Each process had the view of “owning” the machine, being alone on the machine as it were. Each a small turing/
|
||||
von neumann machine.
|
||||
%p
|
||||
But one had to wait for io, the network and so it was difficult, or even impossible to get one process to use the machine
|
||||
to the hilt.
|
||||
%p
|
||||
IPC mechnisms were and are sockets, shared memory regions, files, each with their own sets of strengths, weaknesses and
|
||||
api’s, all deemed complicated and slow. Each switch encurs a process switch and processes are not lightweight structures.
|
||||
%h3#thread Thread
|
||||
%p
|
||||
And so threads were born as a lightweight mechanisms of getting more things done. Concurrently, because when the one
|
||||
thread is in a kernel call, it is suspended.
|
||||
%h4#green-or-fibre Green or fibre
|
||||
%p
|
||||
The first threads that people did without kernel support, were quickly found not to solve the problem so well. Because as any
|
||||
thread is calling the kernel, all threads stop. Not really that much won one might think, but wrongly.
|
||||
%p
|
||||
Now that Green threads are coming back in fashion as fibres they are used for lightweight concurrency, actor programming and
|
||||
we find that the different viewpoint can help to express some solutions more naturally.
|
||||
%h4#kernel-threads Kernel threads
|
||||
%p
|
||||
The real solution, where the kernel knows about threads and does the scheduling, took some while to become standard and
|
||||
makes processes more complicated a fair degree. Luckily we don’t code kernels and don’t have to worry.
|
||||
%p
|
||||
But we do have to deal with the issues that come up. The isse is off course data corruption. I don’t even want to go into
|
||||
how to fix this, or the different ways that have been introduced, because the main thrust becomes clear in the next chapter:
|
||||
%h3#broken-model Broken model
|
||||
%p
|
||||
My main point about threads is that they are one of the worse hacks, especially in a c environemnt. Processes had a good
|
||||
model of a programm with a global memory. The equivalent of threads would have been shared memory with
|
||||
%strong many
|
||||
programs
|
||||
connected. A nightmare. It even breaks that old turing idea and so it is very difficult to reason about what goes on in a
|
||||
multi threaded program, and the only ways this is achieved is by developing a more restrictive model.
|
||||
%p
|
||||
In essence the thread memory model is broken. Ideally i would not like to implement it, or if implemented, at least fix it
|
||||
first.
|
||||
%p But what is the fix? It is in essence what the process model was, ie each thread has it’s own memory.
|
||||
%h3#thread-memory Thread memory
|
||||
%p
|
||||
In OO it is possible to fix the thread model, just because we have no global memory access. In effect the memory model
|
||||
must be inverted: instead of almost all memory being shared by all threads and each thread having a small thread local
|
||||
storage, threads must have mostly thread specific data and a small amount of shared resources.
|
||||
%p
|
||||
A thread would thus work as a process used. In essence it can update any data it sees without restrictions. It must
|
||||
exchange data with other threads through specified global objects, that take the role of what ipc used to be.
|
||||
%p In an oo system this can be enforced by strict pass-by-value over thread borders.
|
||||
%p
|
||||
The itc (inter thread communication) objects are the only ones that need current thread synchronization techniques.
|
||||
The one mechanism that could cover all needs could be a simple lists.
|
||||
%h3#rubyx RubyX
|
||||
%p
|
||||
The original problem of what a program does during a kernel call could be solved by a very small number of kernel threads.
|
||||
Any kernel call would be listed and “c” threads would pick them up to execute them and return the result.
|
||||
%p
|
||||
All other threads could be managed as green threads. Threads may not share objects, other than a small number of system
|
||||
provided.
|
@ -1,78 +0,0 @@
|
||||
---
|
||||
layout: rubyx
|
||||
title: Threads are broken
|
||||
author: Torsten
|
||||
---
|
||||
|
||||
Having just read about rubys threads, i was moved to collect my thoughts on the topic. How this will influence implementation
|
||||
i am not sure yet. But good to get it out on paper as a basis for communication.
|
||||
|
||||
### Processes
|
||||
|
||||
I find it helps to consider why we have threads. Before threads, unix had only processes and ipc,
|
||||
so inter-process-communication.
|
||||
|
||||
Processes were a good idea, keeping each programm save from the mistakes of others by restricting access to the processes
|
||||
own memory. Each process had the view of "owning" the machine, being alone on the machine as it were. Each a small turing/
|
||||
von neumann machine.
|
||||
|
||||
But one had to wait for io, the network and so it was difficult, or even impossible to get one process to use the machine
|
||||
to the hilt.
|
||||
|
||||
IPC mechnisms were and are sockets, shared memory regions, files, each with their own sets of strengths, weaknesses and
|
||||
api's, all deemed complicated and slow. Each switch encurs a process switch and processes are not lightweight structures.
|
||||
|
||||
### Thread
|
||||
|
||||
And so threads were born as a lightweight mechanisms of getting more things done. Concurrently, because when the one
|
||||
thread is in a kernel call, it is suspended.
|
||||
|
||||
#### Green or fibre
|
||||
|
||||
The first threads that people did without kernel support, were quickly found not to solve the problem so well. Because as any
|
||||
thread is calling the kernel, all threads stop. Not really that much won one might think, but wrongly.
|
||||
|
||||
Now that Green threads are coming back in fashion as fibres they are used for lightweight concurrency, actor programming and
|
||||
we find that the different viewpoint can help to express some solutions more naturally.
|
||||
|
||||
#### Kernel threads
|
||||
|
||||
The real solution, where the kernel knows about threads and does the scheduling, took some while to become standard and
|
||||
makes processes more complicated a fair degree. Luckily we don't code kernels and don't have to worry.
|
||||
|
||||
But we do have to deal with the issues that come up. The isse is off course data corruption. I don't even want to go into
|
||||
how to fix this, or the different ways that have been introduced, because the main thrust becomes clear in the next chapter:
|
||||
|
||||
### Broken model
|
||||
|
||||
My main point about threads is that they are one of the worse hacks, especially in a c environemnt. Processes had a good
|
||||
model of a programm with a global memory. The equivalent of threads would have been shared memory with **many** programs
|
||||
connected. A nightmare. It even breaks that old turing idea and so it is very difficult to reason about what goes on in a
|
||||
multi threaded program, and the only ways this is achieved is by developing a more restrictive model.
|
||||
|
||||
In essence the thread memory model is broken. Ideally i would not like to implement it, or if implemented, at least fix it
|
||||
first.
|
||||
|
||||
But what is the fix? It is in essence what the process model was, ie each thread has it's own memory.
|
||||
|
||||
### Thread memory
|
||||
|
||||
In OO it is possible to fix the thread model, just because we have no global memory access. In effect the memory model
|
||||
must be inverted: instead of almost all memory being shared by all threads and each thread having a small thread local
|
||||
storage, threads must have mostly thread specific data and a small amount of shared resources.
|
||||
|
||||
A thread would thus work as a process used. In essence it can update any data it sees without restrictions. It must
|
||||
exchange data with other threads through specified global objects, that take the role of what ipc used to be.
|
||||
|
||||
In an oo system this can be enforced by strict pass-by-value over thread borders.
|
||||
|
||||
The itc (inter thread communication) objects are the only ones that need current thread synchronization techniques.
|
||||
The one mechanism that could cover all needs could be a simple lists.
|
||||
|
||||
### RubyX
|
||||
|
||||
The original problem of what a program does during a kernel call could be solved by a very small number of kernel threads.
|
||||
Any kernel call would be listed and "c" threads would pick them up to execute them and return the result.
|
||||
|
||||
All other threads could be managed as green threads. Threads may not share objects, other than a small number of system
|
||||
provided.
|
Reference in New Issue
Block a user