UP | HOME

Programming

Just write everything down

The main problem with programming is one being totally overwhelmed with irrelevant details, especially in C-like imperative languages.

The solution is the classic layered design, with layers of DSLs on top of one another. These DLSa are just functions, standardized interfaces and protocols. This is how TeX (and LaTeX) has been written and other remarkable things like the whole X Window System.

The complexity, however, is still overwhelming. The final solution is to write everything down as informal specifications, formal specifications and even discrete math.

Informal specs should written be in org-mode, formal specs in pure functional languages or even specialized theorem provers such as Coq or Agda (both have an embedded pure-functional language which can be used for a quick prototyping).

Actually, the real problem is that you will forget everything you have thought. So it pays to write down as much as you can.

Writing is also the best known from of transmitting a knowledge through the time and to other people, including to an older you.

Writing itself is a very challenging and difficult task, since an art-level clarity and brevity is required, similar to the best mathematical texts which accompany some proofs.

Test Driven Development

The consistent practice of writing tests before code is a rare non-bullshit advice from the world of project management.

It helps you to focus on a higher-level concepts of interfaces (type-signatures) and modules and check both the use side and the definition side.

ADTs and corresponding modules are at the very core of programming (within every paradigm) so to beginning with and focus on them is indeed a valuable habit to develop.

Write the tests - your “use side” - before an implementation, and check if everything is in accordance with your specifications. This is an outline of actual design process, insead of day-dreaming and hand-waving.

Math

It turns out that before thinking about actual interfaces (and type-signatures) even higher-level and mode abstract mathematical thinking (and writing) is required.

The two DLSs - of The Set Theory and of the First-Order Logic - are at the code of all modern mathematics. Other specialized “logics” are occasionally useful.

Always write all the relevant math down (in a LaTeX subset of org-mode). This will pay you back immensely. First, because you will forget it all in a week. Second, because other people can check and even verify your reasoning. This is priceless.

UNIX Philosophy + Functional Programming

It always looks simple in retrospect. Just take the best parts of the UNIX Philosophy (do just one thing but do it well, use text streams, files, pipes and sockets) and use SML/NJ, Erlang, Haskell or Ocaml instead of C or C++. Well, SML is rather a monument of the glorious times long gone, so Ocaml or Haskell but never re-implementing the wheel, and just FFI’ing everything.

If you are trapped in the JVM ecosystem, then there is Scala 3, which doing almost everything right, except, of course, the fucking dependencines and 3 versions of the compiler and standard library in a smallest setup. Well, with Haskell that is even worse.

Text streams and FP maps very well with so-called Micro-services architecture - as long as you are pure functional (Same input - same output. Always.), which implies referentially transparent, you can really mix and match “services” just like a functional programmer mix and match functions. Yes, for real. Coursier, Kafka or Twitter toolchain are the actual use cases.

Just getting rid of imperative code (aliasing problems, lifetimes, locks, semaphors, etc) and verbose, boated sources is already a bliss and an unexpected return of the joy of programming which programmers of the Golden Age (Common Lisp, Smalltalk, Standard ML, Miranda, Haskell) used to feel. Happy, happy, joy, joy (this is from the MIT Scheme).

Using a high level, mostly functional language in UNIX instead of C is (unfortunately) not my idea. Bell Labs had a research project which resulted in development of the Basis Library and a couple of books. Today, it seems that Ocaml picked everything up where they left. Just like it was with SML, a major theorem proover (Coq) is the main project for the whole ecosystem, and it has just the standard library in dependencies (well, almost).

Another thing to add is what is loosely called reactive programming which is just triggers on events from an event stream. This is already a mess and a contenst in blog post idiocy, but one could use the right abstractions (which are streams and typed events to pattern-match upon). Some Haskell’s graphics and animation libraries got it right.

Any kind of immutable databases, ledgers, etc. could naturally be developed using this approach. Datomic of Rich Hickey is an attempt for an immutable store. XTZ is a blockchain implementation in pure Ocaml, while others still got stuck with C and C++.

The Haskell Platform (GHC + a small set of well-designed libraries) and the Ocaml Ecosystem (the compiler, tools and the libraries via opam) are production-ready environments. One always could enrich them with Coq, Agda and TLA+ to reach the level of an art (instead of that learn to code in javascript or python fucking abomination).

Algebraic types and pattern-matching

Anything could be build out of Algebraic Data Types.

There are AND-types (A, B - product-types), OR-types (A | B - sum-types) and lambda types (A -> B - correspond to exponents).

Everything becomes self-evident when we think of types as finite sets and consider all the individual unique mappings (arrows) between the elements of any two such sets.

We could pattern-match on data-constructors which conceptually are the same as type-tags for every value.

Logically, we can imagine that each value has an implicit type-tag attached to it. Type-tags are as old as early LISPs and this, in turn, corresponds to unique shapes of common molecular structures in biology.

A list is a recursive sum-type. So are trees. A record is a product type with tagged (named) values (parts). A table is a mapping (just like a function) from “rows” to “columns”.

One could think of a table as a list of tuples (an ordered sequence of fixed-size “sequences”). The early LISPs has been built this way.

The Relational Model

Everything “fits together” because conceptually, at the most abstract level it is all just sets of tuples, which is the basis of both Functional Languages and the Relational Model.

The crucial principle is that relations are just sets of tuples, and the whole relations are immutable values, with all the usual nice properties and implications.

Everything is “naturally” typed with Algebraic Data Types - a record is just a “named” tuple, which, in turn, is a product-type. Sum-types provide all the required enumerables.

There is a simple and obvious relation algebra based on the algebra of the Set Theory, with the closure property.

Like any other values, these relation values can be passed to and returned from pure functions, used within persistent aggregates, put into immutable trees, etc.

This maps well to persistent data structures (trees and maps) of functional languages.

Lists of records (“named” tuples) where the actual ordering is irrelevant (unless deliberately sorted) is an obvious transition from the abstract universe of Sets to the concrete reality of actual representations and languages.

Managing Complexity

As long as we are able to model and formally specify our domain in the language of mathematics, which is basically 2 DSLs -the Sets and the First Order Logic and could derive a simple domain-specific /algebra, the complexity is under our control.

Zooming in and out

This is the most important thing in managing complexity - the ability to zoom in from a mathematical model to actual language types and implementations inside modules and back to the most general and abstract notions.

This ability is the key to success and the biginning of an art.’

Formal systems

As long as we are writing math we are able to formally verify it.

TLA+, Agda and Coq are the tools of the trade. Notice that the larter has an embedded pure functional language with algebraic types and pattern matching, which is, again, the universal core language.

Languages

The modern descendants of the ML-family of languages are an “ideal” choice.

Haskell is particularly good, because it restricts and has the most advanced type system.

Ocaml shines with its module system and syntactic extensions.

Scala 3 is also good, given that we use our own simple libraries (made by ripping off the essence from an over-“engineered” bullshit).

Building blocks

There are basically 3 things that could happen with two (one more than one) values (or whole sets, or types)

  • a, b (which corresponds logical AND also, a product)
  • a | b (which corresponds logical OR else, a disjoint union)
  • a -> b (which corresponds to a logical implication or to a single STEP of computation (or evaluation of an expression), which can be abstracted out as a function).

This, by the way, is when we transform values into the same type or into another of the same kind. What if we transform them into something completely different?

  • a -> f b - this is “putting into” or “sending” (no way to get back)
  • f a -> a - this is “tacking from” or “receiving” (no way to put it there) These are duals of each other.

One could think of a -> f b as extending an immutable structure, and, conversely, f a -> a as traversing an immutable structure

Notice that with the first two arrangements the order is irrelevant and the two views are isomorphic, while the third relation is directed and b -> a is not the same, but there could be a such inverse function (or operation) within a particular context.

Adding the universal notion of nesting we could have infinite lists (which is a recursive structure or a nested type - a sum of a product).

Monads

A Monad (as an Abstract Data Type) has to form an algebraic structure called Monoid with respect to its operations (defined as a standardized interface - >>= (composition) and return).

This interface, which has to form an instance of a Monoid with respect to the binary operator >>=, is a generalized abstraction barrier for composable entities, or values of a particular type.

It is defined as a type-class, implying that any particular type (an ADT) which implements the interface (and ensures Monoidal laws - associativity and identity properties) becomes an instance of a Monad.

Now there is the trick. Behind this generalized abstraction barrier (of composable values) could be a value of a sum-type, of a product-type, of a function-type or any nested combinations of these. So there are different kinds of Monads.

Or, operationally, once we can implement the minimal interface (>>= and return) for a given ADT, it becomes an instance of a Monad. To be an instance of a /Monad /is to (also) implement the interface.

A List is a such particular instance, where >>= corresponds to append, and the [] is an identity element.

A parameterized sum-type behind this abstraction barrier is a common idiom.

  • promises and other kids of asynchronous operations
  • computations which may fail in general
  • values which could be empty or contain nothing

or even an arbitrary sum-type (an a | b) could be made into an instance of a Monad, which means the ADT forms a Monoid with respect to some associative composition operator abstracted out as >>=.

Notice this very careful wording.

In decent languages the static type system guarantees that this abstraction barrier cannot be violated at runtime.

Additional type-system tricks, such as use of so-called existential types with phantom parameters makes a type-level guarantee that a value cannot escape from a given context or simply cannot be accessed in principle.

Lists, Records, Queues, Streams, Logs

  • Lists are position ordered
  • Records are by name (associative)
  • Logs are lists of strings which represent records (containing a timestamp)
  • Queues are async messages without knowing anything about the receiver.
  • Streams are serialized events (by sense organs of perception).

Which a by name pattern one don’t have to know what else is there. This implies new names (or keys) could be added later without affection anything which is already there.

Author: <schiptsov@gmail.com>

Email: lngnmn2@yahoo.com

Created: 2023-08-08 Tue 18:38

Emacs 29.1.50 (Org mode 9.7-pre)