Projects
The open-source software ecosystem is, perhaps, the most complex social system ever emerged. Financial markets seem like even more complex, but I have an intuition that they have just a few tricks repeated over and over again.
The fundamental problem are: complexity and social dynamics (The hell is other people). Most of the time the “other people” create unnecessary complexity, unnecessary, redundant abstractions, meme-trends and even meme-technologies, and low effort crap in general.
On the other hand, some (very few) create good stuff, like the Linux kernel or the GNU toolchain or the LLVM project or Firefox or Chromium or Pytorch or Mesa or OpenSSH. But these are rarest exceptions. Almost everything is a low-effort crap, junk-food of software.
The great classic projects of the past have shown us that thinks could, in
principle and in actual reality, can be done way better, qualitatively better,
orders of magnitude better. Look what Lotus 123 for DOS
is or Open Genera
, or
MIT Scheme
or SML/NJ
, or GHC
. But to do this require a very differnt minds -
disciplined, backed by good habits and trained in a classic education, which is
basically concrete mathematics, logic and non-bullshit philosophy.
Anyway, what can we do with out own “projects” within this open-source software ecosystem?
Well, we have to understand it in a principle-guided way, its workings and its social dynamics and rely on the universal notions of abstraction, partitioning and composition which are at the code of everything.
The proper abstractions (properly captured and adequately generalized concepts) are the foundation. We have absolutely to be grounded in reality, in What Is. Without this everything will go wrong in a way of this or that socially-constructed delusion, self-deception, wishful thinning, cognitive biases, etc.
The mantra is: bullshit, bullshit everywhere. Socially constructed and maintained bullshit.
So. How do I know that the particular version of gcc
in my Gentoo system is
good? I absolutely cannot and do not want to examine its implementation
details - how well it does SSA, code generation, etc. I do not have enough
mental capacity in the first place.
The only thing I could do in this situation is to “trust them”, which is, basically, “to believe” instead of knowing, because it is so difficult and time-consuming to know. This is, of course, unacceptable. Let the idiots believe.
A better approach is to understand the dynamics and to run experiments - to text. Yes, the Dijkstra meme – any amount of testing, in principle, cannot guarantee the absence of problems, only show that there are problems. However, even sloppy experiments are better than a guess-work and “trust”.
If I have compiled form sources both Chromium and Firefox (and all the
dependencies) with this version of clang++
and it works, then I could
conclude that I have done at least some coverage-testing - the “common paths”
through the code produced satisfactory results.
So, I have applied several principles of a systematic testing - I have treated the software artifacts both as a black-box abstraction (details of something I do not want to even know), and to some extent a glass-box abstraction, where I can test the major “paths” through the code. This is better than nothing.
The systematic testing, without knowing the implementation details is the way.
Now lets think about the code.
Yes, in an idealized world we want that the most competent and well-educated people have taken the responsibility of creating and maintaining the most complex, difficult and tedious parts of the systems, and just provide us a easy, conceptually right state-less high-level interface (like arithmetic).
Not just that, we want them to improve the internals, which we do not understand, keeping the interfaces intact, so each new version will improve out experience, support new architectures and the latest optimization research, and take advantage of all the modern reach and emergent technologies.
Well, the LLVM projects or GCC do exactly this, and Google just recompiled its
code with a new version of clang++
when it is ready. This is, however, the
rarest wonderful miracle (and Google fucked itself in the past by sticling to
the stale g++
).
In reality, however, maintainers could suddenly decide to re-implement everything using some fucking async framework, to drop the support of installation of local packages because some degen submitted a new PEP, to require fucking snapd or flatpack or any other redundant crap, and what not.
What about our own code?
In the ideal world of our naive imagination, we want to have just right concepts
and corresponding abstractions being optimally implemented using a minimal, just
right implementation (in Haskell or Ocaml), based on the formal specification,
which has been checked by TLA+
.
Ok, at least we want to extract just the right, definitive implementations from the well-established projects, package them as shared libraries or loadable modules, wrap a high-level API around, and use it in our code.
But what would happen when the project suddenly decided to change anything because of some new meme? Well, we have to watch out. Regular testing should break due to incompatibility and signal to us that we are fucked.
This, by the way, is a reasonable strategy. This is exactly how we rely on
standard libraries, such as glibc
or libstdc++
. These are the collections of
specialized modules, arguably doing the right thing (which we cannot
understand).
I would argue, that at least libc++
does the right thing, because so many
businesses rely on it.
You, however, are not a business, not even an organization, you are some lonely wolf of the steppes, what would you do?
Well, you have to learn when to know (and what to know) and when to delegate (and what to delegate).
It seems that you have to know the underlying principles and the underlying mathematics (and assumptions) and delegate all the implementations, just as you do with compilers or standard libraries.
In some rare cases you might want be sure and to re-implement (by ripping off the other people’s code) some critical parts, or at least repackage as a lib and vendor it as a decency.
The canonical example is the crapto ecosystem. The “design” by an unqualified and often uneducated amateurs is carp, the code is crap, but it somehow works (just as PHP3 works).
In this situation I really want to reuse the code and understand the principles, concepts and math to just do the right thing using crappy tools. Basically I want to send just right (well-formed and checked for correctness) messages and adequately and timely react to the relevant changes.
So I could either extract sources from, say, bitcoin
(reference implementation)
and repackage as a set of small shared libraries (similar to what abseil-cpp
does), or, even better, to have a FFI
to high-level functions, for a high-level
functional language or even a command-line tool (to be called from a script).
This is the level of understanding one should have – to know the underlying principles, proper concepts and math, and therefore to know what is relevant and what isn’t, what to ignore and delegate and what to pay attention to.
The only problem is that extraction and refinement of such knowledge takes time. Everything decent takes time, unfortunately.
The principle, however, is solid - remain on the highest conceptual and mathematical level (sets and logic) and dissect the crap (along the abstraction barriers) into most relevant parts and interfaces.
Not just that, one have to zoom-through the layers of abstractions, from the underlying concepts of the domain, to actual representations of the actual data structures, which are used in protocols.
So, you still want to do this? For free?
OK, the keywords are FFIs and high-level declarative embedded DSLs (which otherwise would require its own interpreter).
But wait, isn’t Common Lisp guys and then Haskell guys….? Yes.