move posts in directory by year

This commit is contained in:
2019-12-07 11:30:52 +02:00
parent 5ce2d1b625
commit 285c6531e4
38 changed files with 8 additions and 11 deletions

View File

@@ -0,0 +1,115 @@
%p
Now that i
%em have
had time to write some more code (250 commits last month), here is
the good news:
%h2#sending-is-done Sending is done
%p
A dynamic language like ruby really has at its heart the dynamic method resolution. Without
that wed be writing C++. Not much can be done in ruby without looking up methods.
%p
Yet all this time i have been running circles around this mother of a problem, because
(after all) it is a BIG one. It must be the one single most important reason why dynamic
languages are interpreted and not compiled.
%h2#a-brief-recap A brief recap
%p
Last year already i started on a rewrite. After hitting this exact same wall for the fourth
time. I put in some more Layers, the way a good programmer fixes any daunting problem.
%p
The
%a{:href => "https://github.com/ruby-x/rubyx"} Readme
has quite a good summary on the new layers,
and off course ill update the architecture soon. But in case you didnt click, here is the
very very short summary:
%ul
%li Vool is a Virtual Object Oriented Language.
Virtual in that is has no own syntax. But
it has semantics, and those are substantially simpler than ruby. Vool is Ruby without
the fluff.
%li Mom, the Minimal Object Machine layer is the first machine layer.
Mom has no concept of memory
yet, only objects. Data is transferred directly from object
to object with one of Moms main instructions, the SlotLoad.
%li Risc layer here abstracts the Arm in a minimal and independent way.
It does not model
any real RISC cpu instruction set, but rather implements what is needed for rubyx.
%li Arm and Elf:
There is a minimal
%em Arm
translator that transforms Risc instructions to Arm instructions.
Arm instructions assemble themselves into binary code. A minimal
%em Elf
implementation is
able to create executable binaries from the assembled code and Parfait objects.
%li Parfait:
Generating code (by descending above layers) is only half the story in an oo system.
The other half is classes, types, constant objects and a minimal run-time. This is
what is Parfait is.
%h2#compiling-and-building Compiling and building
%p
After having finished all this layering work, i was back to square
= succeed ":" do
%em resolve
%p
But off course when i got there i started thinking that the resolve method (in ruby)
would need resolve itself. And after briefly considering cheating (hardcoding type
information into this
%em one
method), i opted to write the code in Risc. Basically assembler.
%p
And it was horrible. It worked, but it was completely unreadable. So then i wrote a dsl for
generating risc instructions, using a combination of method_missing, instance_eval and
operator overloading. The result is quite readable code, a mixture between assembler and
a mathematical notation, where one can just freely name registers and move data around
with
%em []
and
= succeed "." do
%em «
%p
By then resolving worked, but it was still a method. Since it was already in risc, i basically
inlined the code by creating a new Mom instruction and moving the code to its
= succeed "." do
%em to_risc
%p
A small bug in calling the resulting method was fixed, and
= succeed "," do
%em voila
%h2#the-proof The proof
%p
Previous, static, Hello Worlds looked like this:
%blockquote
“Hello world”.putstring
%p
Off course we can know the type that putstring applies to and so this does not
involve any method resolution at runtime, only at compile time.
%p
Todays step is thus:
%blockquote
a = “Hello World”
%br
a.putstring
%p
This does involve a run-time lookup of the
%em putstring
method. It being a method on String,
it is indeed found and called.(1) Hurray.
%p
And maths works too:
%blockquote
a = 150
%br
a.div10
%p
Does indeed result in 15. Also most operator (+,- <<) work. Even with the
%em new
integers. Part of the rewrite was to upgrade integers to first class objects.
%p
PS(1): I know with more analysis the compiler
%em could
now that
%em a
is a String (or Integer),
but just now it doesnt. Take my word for it or even better, read the code.

View File

@@ -0,0 +1,90 @@
%p
After
=link_to "finishing the code," , "/blog/a-dynamic-hello-world"
i updated all the docs too!
%h2 The rewrite
%p
Doing anything for the first time is not so easy. I have taught enough by now to see
how central
%em guidance,
the experience of another, is to the process of learning.
%br
I was so much thinking about Vm's in the beginning that a lot went sideways.
%p
Now it feels
=link_to "the abstractions", "rubyx/layers.html"
are coming into focus, the code is clean and relatively easy to understand.
%p
During this latest wobble, maybe 500 commits in all, almost everything above the
Risc layer was rewritten. At the low point, i was down to just over 400 tests, but
now, back strong, at over 800. That's 94% coverage with a
=ext_link "CodeClimate A", "https://codeclimate.com/github/ruby-x/rubyx/"
so that's ok.
%p
In the process i got much closer to the actual goal, which i'll go into more detail.
%h2 The docs
%p
Now i have also cleaned up all the documentation. This does not mean that everything
is documented, but i hope one can get a good idea from the docs, and then just
read the code.
%h3 Architecture
%p
The
=link_to "Architecture" , "/rubyx/layers.html"
section given an overview over the new layers.
%ul
%li Ruby Simplified: Vool
%li Mom, a simple machine with object memory
%li Risc, the old abstraction of a CPU
%li Arm and Elf, to actually generate binaries
%li Parfait and Builtin to get the the system up
%h3 Parfait
%p
There is a separate document describing the classes needed to boot the system.
A little about the current state of Types and Classes.
%p
But it should definitely be expanded, and there is nothing about Builtin.
Builtin is the way to write methods that can not be expressed in ruby. And since
writing them got so messy i wrote a DSL, which is only documented in
=ext_link "code." , "https://github.com/ruby-x/rubyx/blob/master/lib/risc/builder.rb"
%h3 Calling
%p
Since Calling is now done, i documented both the
=link_to "calling convention,", ""
and the way
=link_to "method resolution", ""
is done.
%h3 Interpreter
%p
Off course the
=link_to "Interpreter", "rubyx/debugger.html"
is still working, (since the Risc layer didn't change much) and is a large part of
the testing scheme.
%p
And i even go the
=link_to "Debugger" , "/debugger"
working again and integrated into the new site (which is now a rails app).
%h3 Misc
%p
Finally i cleaned up the old mumble jumble docs and sorted them a bit into what
is ideas, plans and just background info, in the
=link_to "Misc section" , "/misc/index.html"
%h2 Next Steps
%p
The plan for the near future is something like this
%ul
%li
More complicated tests. Whole methods that do something and
test containing several methods. All just testing results.
%li Better test framework for testing binaries.
%li Blocks
%li Baby steps towards Stdlib
%p As one can see, work happens when i have time and inspiration.
%p.full_width
=image_tag "github-timeline-2018.jpg"
But even after 4 years, i haven't given up yet :-)
Though i may have given up on any time estimates.

View File

@@ -0,0 +1,126 @@
%p
It was almost going to be working binaries and over 1000 tests. But i am coming
more and more to the point where software is measured in number of tests, not
lines of code.
%h2 1000+ Tests
%p.full_width
=image_tag "1000_tests.jpg"
It was shortly after the last post that i first noticed that 1k was approaching.
A little hard to grasp that i have written all those, kind of: what do they
all do?
%p
A good step too: just about 200 tests in 2 months of work. I noticed the couple of
times i didn't have good coverage for new code immediately, i started to have
problems and had to write tests later. It seems it is the only way to understand
even my own code anymore: by making the assumptions explicit. Some of the bugs
where tests were missing are just the classics, +1 or -1 errors. And i feel a real
newbie having to debug for 5 hours to find it was "<" not "<=" .
%h2 Working binaries
%p
But off course it feels good to finally have
%b working binaries.
This is after all the first time that i compile real ruby into real binary.
The compiler does off course have many limitations, but what it does, it does
right. Even it was just Hello World for starters.
%br
=image_tag "hello.jpg"
Off course i tried the next one straight after, "2+2" and .... it worked too.
I don't know if that is just my bad habit or an occupational thing, this
being surprised when things work.
%br
But a little bit about the journey and what how this works.
%h3 Positioning
%p
Since ruby-x approach is oo from the start, we do not rely on the C way of creating
binaries. Instead, the binary is a sort of snapshot of a running system, or
in other words there is only heap.
%p
This means we create binaries that look the same as the memory during runtime,
which is made up of small fixed sized objects. Currently we only have objects
of sizes 2 to the power of 2,3, 4 and 5. Maybe larger later, but with oo
complete data hiding it is easy to extend objects transparently.
%h3 Constant loading
%p
Especially for Code (the only objects larger than 16 words currently),
this presented a challenge. Maybe even an extra challenge on top of the
purely static one, because of the way ARM load constants.
%p
Constant loading happens when a known object or address is loaded into a register.
Arms constant 32bit instruction only allow 10 bit constants to be loaded.
So if the constant is larger (eg the object further away) two instructions instead
of one are needed. But this only becomes clear when all positions of objects
have been determined.
%h3 Event approach
%p
Off course this is not new, and this is in fact the third time i have coded this,
finally getting it right. The problem gets hairy with the 16 words limit,
when the code overlaps the originally assigned length and a new object has
to be inserted.
%p
To keep one methods code continuous, all other methods code has to be moved up,
an thus a whole lot of positions change. Off course when some objects position
change, a load depending on that may go from 1 to 2 instructions and so on and
on.
And then there are the branches that load their targets (forward and backward
branches) off course, and they need to be updated etc etc.
%p
I now have position objects which fire events, and about 4 different kind of
listeners reacting in different ways when different objects change. The whole
thing works, though as with many an event system, it is difficult to say
exactly how. (only easy in the small, not the whole i mean)
%h3 Object continuation
%p
As i mentioned, it is quite straightforward to have larger data amounts, made up
of 16 word chunks, by having a linked list. This is how the BinaryCode objects, that
hold the binary code, do it.
%p
But with the binaries there is an extra twist to this. The BinaryCode object has a
header (the type and next), which are not code. So the code has to jump over this
header at every end of an object.
%p.full_width
=image_tag "binary_codes.jpg"
This is demonstrated by the object dump above. If the assembly is scary, don't
worry, just look at the top left, address 16260, where the BinaryCode object for
the main method starts. You see the first two words are separated, as i said the
type and next (see the 162a0 value is the address of the BinaryCode on the right).
%p
Mainly i wanted to demonstrate the jump, which is the last instruction on the left
side. The
%b b
stands for branch and the address 162a8 is exactly the code of the next BinaryCode,
ie just after the header.
%p
You can just make out on the bottom left, that this is in fact the code for the
"Hello World" , as it jumps to the (Word_Type.) putstring.
%h2 Next steps
%p
Hello World is off course a very small step and work will continue on making other
things work. On the Interpreter side, many more things, like loops, conditionals,
maths and dynamic dispatch already work.
%p
Luckily, part of this push was to make the Interpreter a platform similar to the
Arm. So it too has BinaryCode and works with addresses, not objects as before.
In short the differences between Interpreter and Arm have shrunk, and there is
good reason to believe that much will work quite soon.
%p
Next i will build a testing framework to test the same code on Interpreter and
Arm and see that both work. And specifically get all those working Interpreter
tests working on Arm.
%p
I think then it is time for some benchmarks. It has been a while since
=link_to "i made some," , "/misc/soml_benchmarks.html"
and they were quite promising. Especially loops of the Hello World and
Fibonacci.
%p
On the further horizon i was planning for continuations next, probably with
a small rework of the return mechanism (unified return sequence).

View File

@@ -0,0 +1,184 @@
%p
Off course the
=link_to "architecture" , "/rubyx/layers.html"
gives a good overview of the system as it is. But it does not explain how we got
there. And sometimes knowing the journey makes it easier to understand where
one is. So i shall try to highlight the four or five main
%h2 Macbook + Ruby == Rasperry Pi
%p.full_width
=image_tag "mac_plus.png"
When i bought my first 30Euro Pi i noticed that ruby is unusable on it.
Looking at how slow ruby actually is, it occurred to me that ruby just about turns
the Pi into my first 286 laptop (running at 6MHz), which is the same as turning my
MacBook Pro into a Pi.
%p
Off course, while working on web-apps, which can be parallelized so easily, and with
a company paying both developer and hardware, the std ruby argument holds.
But since i wanted to use my pi for demanding projects something had to be done.
%h2 Judy, the importance of cpu cache
%p
=ext_link "Judy" , "http://judy.sourceforge.net/"
is a really really fast digital tree, kind of hash. I actually built a memory
database with it that was also really really fast. When connecting it to rails i
ran into the above problem, the niceties of ActiveRecord (ruby) brought performance
of my extension (c) down by a factor of 40.
%p
But anyway, the point is that Judy's speed is based on a radical optimisation for
cache lines (and key compression). This means all data structures are exactly a cpu
cache line big. As i learned, cpu's do not access memory in word sizes, but instead, always
a cache line at a time. This basically lead to ruby-x's memory model, which is
fixed sized objects, multiples of a cache-line.
%h3 Microkernel
%p
As a young engineer, i thought, as my peers, that Linux (then 0.93) was the greatest
thing. Only much later did i learn that it is just a copy really, and the reason
it got popular was not technical, but licensing (Same reason it is in Android i
believe). The reason it stayed popular is inertia, in other words writing device
drivers is hard.
%p
=ext_link "Synthesis," , "https://en.wikipedia.org/wiki/Self-modifying_code#Massalin's_Synthesis_kernel"
=ext_link "L4,", "https://en.wikipedia.org/wiki/L4_microkernel_family"
and
=ext_link "Minix," , "http://www.minix3.org/"
are good proof that the superior architecture is the Microkernel. Eg L4 can run
another OS as an application with about 4% performance degradation. Or Minix can
recover from a device driver failure.
%p
This, plus the fact that we have bundler, brought me to the approach that:
If you can leave it out, do. Much of the functionality that is in ruby (mri),
will never be in RubyX, but rather supplied by gems.
%h3 System interrupts
%p
In the beginning i was off course contemplating how much of c based systems i would use.
Like LLVM, which is off course a great tool, though made for c-ish applications.
Or libc, which again is really for c apps to access the kernel.
%p
The sheer size of the functionality one inherits almost swayed me. Even i had long
since determined that one of ruby's biggest flaws, it's std-lib, came from modelling
and using libc.
%p
Then i learned assembler and looked at libc implementations and learned what i
believe made the decision: Kernel calls are not really calls at all. They are
software interrupts, which basically means you fill some registers, flick the
switch, and the next instruction you can collect the result in a specified
register. This may look like a call, and off course, by using libc it is presented
as a call, but it is not. It is a very simple set of assembler instructions.
%p
For me this meant there is very very little benefit in using c, either in it's
libc form, or assembler/linker (i had found a ruby gem to do that easily),
or, maybe most importantly, the c calling convention. All of these things
are great for c programs, but they are just not made for dynamic languages
and that would have brought a whole sloth of problems.
%h3 Return address is a parameter
%p
In C calling (probably other languages too), the return address is determined
in the callee, usually by pushing the pc to the stack. But Arm has a different
way, an instruction called Branch With Link, that actually stores the pc in a
separate register called Link.
%p
And this made me realise, that really, the return address is always a parameter
to a function. Like other parameters it uses a register. It is the C way to
hide this implicit parameter, much in the same way it is the oo way of hiding
the self parameter.
%p
By this time i was already coding some rudimentary calling convention and it
did not take long to verify this in code. It is in fact quite easy to determine
the return address at compile time and pass it explicitly. (Easy if one does not
use a c linker that is)
%h3 OO calling convention
%p
Another thing that deterred me from C is the way they use the stack. It is so
completely not oo and cryptic. It is in other words very difficult to unwind,
and almost impossible to implement closures.
%p
Since the assembly had progressed easily, i made performance tests with an oo
calling convention, and determined that the price would be
=link_to "about 50%." , "/misc/soml_benchmarks.html"
Since currently the gap is more than an order of magnitude, this seemed ok,
given that it would make the compilation process so much easier.
%p
The resulting calling convention uses normal Message objects that form a linked
list, rather than a stack. Since they are completely standard objects, manipulation
both at run and compile time is totally integrated.
%p
Function calling has been working for years, but recently i cracked dynamic method
dispatch too, which was not that hard really. Currently the work is progressing to
blocks, and the clear structure does help a lot.
And while exceptions (or bindings) are not started, i think they will come with
relative ease (compared to the c way), since the structures are very simple.
%h2 Decisions that affect the future
%h3 Metasm
%p
I gave
=ext_link "Metasm" , "https://github.com/jjyg/metasm/"
several long looks. After all it has assembler and disassembler for at least
10 cpu's, and support for several binary formats, including elf. The
reason not to use it was not that it is big (including much we don't need).
But rather that it is unmaintained and unresponsive.
%p
It would be great to split all that code into several gems, a core and one
per cpu / binary format / assembly, disassembly. Only the core would need
to be integrated into rubyx, and one could just use the platform specific
gems. But I am not the one to do this work, was the decision.
%h3 Lock free Concurrency
%p
Concurrency will have to be part of the core, even if it is just to get a gc
working. The work that
=ext_link "Massalin did" , "http://valerieaurora.org/synthesis/SynthesisOS/abs.html"
already showed how effective lock free
concurrency is, but Dr Cliff took it into the modern (java) world by
publishing a
=ext_link "lock free hash" , "https://www.youtube.com/watch?v=HJ-719EGIts"
that he later run on some crazy machine with 800 cpus.
%p
I am not sure whether it will be better to port the java code, or try a
=ext_link "diy" , "https://preshing.com/20130605/the-worlds-simplest-lock-free-hash-table/"
version. And off course to even get started on this rubyx will need the
compare and swap primitives that underly the lock free approach.
But all in due time.
%p
The actual concurrency i am envisioning as two os-threads per core. One for kernel
interaction and one for normal operation. Kernel calls
would never be executed on the second, but always queued on dedicated kernel
threads. The non kernel threads would be used to run fibers.
If we insert some little check into the calling, switching could happen very often
and because of the linked list approach would be very very fast. And because of
the offloading of kennel calls would never stall (completely). This way one can
achieve the sort of millions of fibers erlang is known for.
%h3 House keeping and garbage collection
%p
Often, in systems that are designed to be collected, the base object has some
field to support this. This was deliberately left out. RubyX only has objects,
so the field would have to be an Object, which is too much overhead.
Or there would have to be dedicated instruction to deal with a raw data word
which is too much overhead in another way.
%p
Gc will be a completely external gem, so experimenting will be easy and
encouraged. Gc implementers will just have to use their own structures to keep
track of the state that they need. Judy style digital trees can do this by actually
using less memory than a field would use, but handcrafted bitfields will also be good.
%p
The actual marking phase should be relatively easy, as the world is known completely.
There are no grey stack areas where one has to guess, as all objects are typed
and the type determines which slots are objects. Not even registers are grey
area, as we switch cooperatively; only the Message register is ever valid.
%p
In fact, all this makes even moving objects relatively easy. Though there is off
course the effort of going through the world to find all backlinks. But if that
done during a mark, it comes at relatively low cost.
%p
All in all a very interesting topic, and surely someone will come up with some
great idea. And off course we there will have e to be the most rudimentary from
the start, just enough to work and give someone motivation to improve it.

View File

@@ -0,0 +1,147 @@
%p
Basic enumerator style blocks were not as bad as i though. Admittedly i thought they
would be close to impossible, so compared to that a few hundred commits are really
quite little.
%h2 Different kind of blocks
%p
To start with let me lay the ground. In ruby code, i see blocks used in basically
two kind of ways. I call the first one the
%b implicit block
which is what you do when using iterators/enumerable. Ruby let's you pass the
block as an
%em implicit
argument. This is the kind that is implemented and that i will go into detail about.
%p
The other kind i shall call
%em explicit
is when you define blocks as variables, either with lambda or proc syntax.
As a slight complication implicit blocks may be captured and used in the same
way as explicit blocks, but let's forget for a moment that i said that.
Explicit blocks are good for a more functional style of programming and used much
(much?) less. Also they are the ones that will need some expansion on what we have now.
%h2 Implicit Block properties
%p
Since i never had to implement blocks before, it was a bit of a surprise how simple
it was.
After dynamic dispatch was done i had planned to improve the std library. But i
quickly ran into loops, and doing loops without blocks in ruby is just too weird.
So i started on blocks instead, which i must admit i thought would be very (very)
difficult.
%p
But then i found that actually blocks are very similar to methods, just with a
twist:
As it turns out, the implicit block calling basically guarantees that the caller's caller
is the method where the block is defined. This means one knows all local variables and
method args, while compiling the block. And can thus resolve all variable access
at compile time, who knew!
%p
Ok, just in case that slipped off too quick, i'll say it again: For the
%em implicit
blocks, all variable (local/args/instance) are statically known at compile time.
And since basic control structures (if/while) are obviously the same inside
a block and method, the whole problem of blocks reduces to variable access.
%h2 Base classes
%p
When we have things that are the same in oo, the big oo hammer comes out: inheritance.
So i made a base class for Block and Method, called Callable. And similarly
a base class for MethodCompiler and it's new equivalent BlockCompiler, called
CallableCompiler.
%p
The reason i mention this much detail is just because i was so surprised how little
difference there is between the derived classes. In the case of Block and Method over
95% of the code is in the base class, and for the compilers it's still over 80%.
It really is only that scope resolution.
%p
The difference is that a Method resolves a variable in it's own frame, whereas a Block
resolves it in the frame of the callers caller, ie where it was defined. And since
we have a nice and simple calling convention, it is just two extra instruction per
variable access.
%p
So, in the hope of proving how crazy fast it would be, i started on benchmarks.
But here we come to another story. RubyX does consume memory quite fast, but has
no allocation yet. So i could fix it by creating megabytes of shell objects at
compile time, or bite the bullet and implement
%b "new". 'Cause i'll do that, means we have to wait for the numbers.
%h2 Dynamic Blocks
%p
Since i pushed the Procs aside up there, i just want to say that this was not without
consideration. I think the solution to Procs is not too difficult and the current state
can be expanded to handle them thus: We need to check the method of the callers caller
when entering the block code. If the implicit assumption holds, the code can execute.
If not, we need to jump to an alternate version of the code that does the variable
resolution dynamically.
%p
Basically that means compiling two alternate versions of the code and having the switch
when entering the block code. Again though, since the calling convention is simple,
the runtime resolution is relatively simple. And it can even be coded in ruby,
since we can call out to a method from the generated code.
%h2 Ying and Yang of Methods and Blocks
%p
Sending for methods is sort of equivalent to yielding for blocks. The two use the exact same
calling convention. In fact yield is almost identical to ".send", so when the time
comes to do that, we're almost set.
%p
In methods we have the static case, where the method is known
at compile time. And then we have dynamic dispatch, where the the method is resolved
at run-time and called dynamically. But in both cases variable resolution is
completely compile-time.
%p
And then we have blocks with the "static" version, where the block that is passed
is known at compile-time, but only to the caller, not the callee. So the callee needs
to invoke (yield) dynamically, but still the variable resolution is static (compile-time).
%p
And then the dynamic block version (Procs) where no resolution is necessary to
call the Proc (since it is given as a variable), but instead the variables
have to be resolved at run-time.
%p
To me they are sort of reversely symmetric. I'll have to try and make a diagram
one day.
%h2 Side note on Builder
%p
Since i started with the builder and the associated dsl, i got more and more into it.
The dsl provides quite readable code, there is sort of assignment and a few shortcuts
to other risc instructions. But at the risc level one is really quite busy shuffling
data from here to there, so the "assignment" which covers
=ext_link "RegToSlot" , "https://github.com/ruby-x/rubyx/blob/master/lib/risc/instructions/reg_to_slot.rb"
,
=ext_link "SlotToReg" , "https://github.com/ruby-x/rubyx/blob/master/lib/risc/instructions/slot_to_reg.rb"
and
=ext_link "Transfer" , "https://github.com/ruby-x/rubyx/blob/master/lib/risc/instructions/transfer.rb"
helps a lot.
%p
Because of this, i have now rewritten all of the to_risc functions in Mom, that generate
risc instructions using the dsl. Also the builtin code (including div10, shudder) uses
the dsl. It is
%em much
easier to understand, and gets rid of a fair few crutches i created on the way.
It's even
=link_to "documented", "/rubyx/builder.html.haml"
%h2 Future
%p
As i said, what i really would want to do now is some benchmarking.
At least i got the Fibonacci of 30 to work. That's something! It took 7632 instructions.
That doesn't sound too bad, and is in fact twice as fast as mri (theoretically).
That means 1000 times fibo(30) per second on a PI.
%p
Alas, we need
%b new
first, even to count to 1000. That's not too bad in itself, but it does need allocate.
That in itself is also not too bad, until you get to that else case,
where the memory has run out.
%p
Then there is a mmap syscall and ... what? I guess i'll find out.
%p
A note for the far future: Since we now have different compilers, and we will need
alternative code paths before long, inlining doesn't sound so impossible anymore
either. Just another compiler with different scoping rules, another type test, another
path.