Modules

The old-school guys (and girls!) from the golden age of programming (70s, 80s and early 90s - before Java) usually has a strong math background so they got all the fundamental principles right.

The Modularity principle (both at the level of individual abstract data types and of whole modules around them) is the most fundamental one because it not only mimics Mother Nature (multi-cellular organisms) but addresses the fact of out severe cognitive limitations and abilities to understand complex systems (the 7, plus or minus 2 chunks meme from cognitive psychology).

Packages (an archaic name for libraries) allow the definition of groups (Sets) of related functions that share a local hidden data structure (actual representation) by hiding it (literally, behind an abstraction barrier, which is an absract interface) within a module body.

A module traditionally corresponds to a single source file, plus a header or an interface file (for languages which require them). So, a single file (which should define and implement a single ADT) is a module, and a set of such files is a package.

Disentanglement, Decomposition (of the concepts of a problem), Partitioning (of concepts and corresponding Abstract Data Types) and Nesting (of code) are the main principles. Types and interfaces should be abstract (representation and implementation must be “hidden”

is not required to be known, which is the proper meaning of being abstracted out or away).

A module must be understood independently from other modules (at least loosely coupled to the standard library and a few modules at a lower level of abstraction - privios layer of hierarchical DSLs) and used without needing to examine its implementation.

The idea of a module as a separate unit of compilation, meaning that, at least in theory, it is self-contained, is based on the principle of reducing cognitive load by proper decomposition (or disentanglement) of a problem into sub-problems, systems into sub-systems and of everything into its basic building blocks.

Only in this way large and complex systems can be understood. /Disentanglement, decomposition and formal definition of the basic building blocks following by composing everything back into a coherent whole is the right metaphor.

Modules are where the actual implementations reside, being abstracted out by corresponding /interfaces (of Abstract Data Types).

Ideally, they should form well-defined layers of hierarchical abstractions, where each module interacts only (and only one) level below it (in the abstraction hierarchy). Not above. Not two levels down.

It must be possible to develop individual modules by different people (or teams) with minimal communication required - everything has to be written down and specified (interfaces, constraints, formats, protocols) and only the one layer down API (and the standard library) is required.

This is how proper DSLs should be designed and implemented. Small, specialized modules are the basic building blocks of hierarchies of pure-functional (based on high-order functions) DSLs.

Meta-level

Cognitive overload is the primary motivation for modularity.

There is a limit to how many things a person can think about at once (hence loose coupling - separation, partitioning).

One-to-one correspondence with major concepts in an expert’s language.

Modules are part of the Model, and they should reflect concepts in the domain.

It isn’t just code being divided into modules , but concepts.

Modules (interfaces, actually) are partitions for (in the universes of) concepts.

Modules and their names should reflect insights in the domain.

Refine the model (a spiral-shaped process of continuous refinement ) until it partitions according to high-level domain concepts and the corresponding code is being decoupled as well.

Principles

Everything is to be build upon Algebraic (sum and product) abstract data types.

Abstract means that one uses only constructors exported from a module pattern-matching.

Individual types corresponds to (represent) mental concepts. This is where DDD fuzzes with FP and TDD.

A module contains all the functions specifically related to the main types

It should have ML-like signature checked by the compiler, but OK. Standard ML people in 90s got it all right.

Again, the big picture is: Concepts -> Types -> Interfaces -> Modules We program with algebraic types in FP.

The Abstraction principle

Abstraction is the key to managing overwhelming complexity. • Abstraction is a tool (the only one?) that people use to understand vastly complex systems. • Abstraction allows people to know what a (sub)system does without knowing how - without memorizing all the irrelevant details.

Proper modularity is the manifestation of adequate abstraction • Proper modularity makes a program’s abstractions explicit • Proper modularity can dramatically increase clarity

The Encapsulation principle

A well-designed module encapsulates data (information hiding) • An interface should hide implementation details • A module should use its functions to encapsulate its data • A module should not allow clients to manipulate the data directly

The goal is that the representation and implementation can be changed without being noticed by the “users” of the module.

It is common in ML to have multiple implementations of the same signature which can be used interchangeably.

The Necessery and Sufficient principle

A well-designed module has a minimal interface • Function declaration should be in a module’s interface if and only if: • The function is necessary to make objects complete, or • The function is sufficient - consistent for many clients

Record

A module is just like a record of named and typed slots (type-signatures) and just like a tagged Cartesian product. Access is similar to selectors (for slots of a record).

Modules can be nested, even applied to one another (in ML or Ocaml).

Conceptually, a module is an instance of a record (it is not a record-type) of given type-signatures (which is a symbol-type pair).

We cannot say that a module is a Cartesian Product of typed slots (type-signatures), because they are fixed (are constants) and do not range over some Set (like variables).

This is why we said like.

Structure

So it is an instance (a single record-like value), a particular structure as they called in Standard ML or Ocaml or F#.

The access is similar to selectors for record’s slots.

Signatures

Each such “structure” is an instance of a particular “signature” which corresponds to its particular “structural type”.

Conceptually, a module-signature is just a Set of type-signatures of individual values (which could be functions, exceptions or even types).

Syntactically, a module signatures is a list of individual type-signatures.

The type of a structure is captured in its signature and contains all of the static properties of a module that are needed by some other module that might use it.

Functors in ML

The use of one module by another is captured by special functions called functors that map structures to new structures.

Abstraction barrier

This module-signature, at least in theory, forms an actual Abstraction Barrier (which acts like a cell-membrane of a module) that hides (encapsulates) “concrete” representation and implementation of an Abstract Data Type (ideally, at least).

This analogy with cell biology is deliberate and profound. Abstraction barriers (proper message-passing interfaces) are cell-membranes between loose-coupled modules (and corresponding core types).

Interfaces

The interface of a module gives the type of each exported value.

For exported types, the interface may give either their complete definition or simply their name.

This is how the mind handle concepts - it might know some details or only heard about it - what it stands for (associated with).

Modules export a Set of public (use-side, exposed) interfaces (which are, in turn, Sets of individual type-signatures).

These sets of interfaces (module-signatures) can be specialized or even applied to one another.

SML (and Ocaml) got it right. F# too.

Modular programming

The most important principle in whole Computer Science - the Molecularity Principle - is about how (and why) Abstract Data Types should be defined.

One of the fundamental advantages is that an underlying representation (and related algorithms) could be changed and improved without breaking the public interfaces.

Remember that data dominates. This implies that there is no need to recompile the code if it uses dynamic (shared) libraries, so the existing system could be improved without a re-install.

This is exactly what pragmatic UNIX systems got right by introducing shared libraries.

And this is precisely what delusional purists advocating “reproducible builds” got absolutely wrong. Focus on stability of interfaces and you won’t need to hash every single build like Haskell tools do.

Modules are abstractions closely related to ADTs, with the core principle of being a separate (independent) units of compilation and linking. While intefaces establish abstraction barriers which partition and hook together the code, modules are about compilation units.

They also solve name clashing (conflict) by providing fully qualified symbols (names). When we head about “Modular programming” as a paradigm, it implies ADTs and Modules with name-spaces.

Modules are larger scale the building blocks, while the ultimate building blocks of any software are interfaces and ADTs (which encapsulate actual implementations).

Standard ML (and Ocaml) perfected its module system defining modules as having certain structure (similar to graphs in mathematics) and providing a DSL for application of one module to another (yielding a new related structure) similar to what mathematical Functors do (they are structure-preserving transformations).

To summarize, modules are about standardized interfaces, exactly as ADTs (of which they provide actual data representations and implementations), while compilation units are of second importance. ML guys got it right, others missed the point.