Vedanta (वेदान्त)
Reaching Vedanta every day.
The Upanishadic seers (back then) postulated the existence of the veil which distorts our view of reality (of What Is). This veil (there were no glasses at that time) somehow alters our perception and makes us delusional.
The Buddha and early Buddhists postulated that it is an improper state of The Mind which leads to a fundamental mismatch with What Is, which is the definition of what they call ignorance, and, necessarily, is the ultimate cause of suffering.
The are still not wrong. The same principles apply to artificial and even to abstract (theoretical) intelligent agents - mismatch with the actual environment (inadequate inner representation) is fatal for performance.
The same is a operational definition of psychological insanity - having inadequate representation inside ones’s brain.
The ultimate abstraction principle at work is that we postulate the actual existence of an inner representation of (some aspects of) the shared environment (reality), and abstract away all the “mechanics” of how exactly it is constructed, maintained and used for decision making.
So, what is required (necessary and sufficient) to model what is going on inside our minds?
Well, ultimately everything can be reduced to just dots and arrows in between, plus, which is always overlooked, an environment in which it all takes place, which implies some set of explicit rules, derived from fundamental “laws”.
A dot captures the notion of a value, which is, in turn, a localized (in what we call space and time, which are the notions of an Exrternal Observer) “physical” structure - an atom, a single base molecule, a protein, whatever.
An Arrow is a highly overloaded concept, and there are arrows of different kind, but fundamentally an arrow captures the notion of a transition with always implies a direction.
This notion captures a single step of evaluation, reduction or evolution of a process.
Ideally, a word is a label (and implicit reference to an inner representation) to such a physical structure. We should never have “mappings” to bullshit, but…
An inner environment which is traditionally being modeled as nested Sets and represented as nested lists of pairs - alists) and /immutable references - the arrows.
What is what we “invoke” or “retrieve” with a word? It is not an image, it is not a description, it is a feeling - it is always, in principle, different from what we refer to with the word, and of what really want to “see” or recall.
Again, we abstract out (and away) the acual implementation details. We invoke something with thinking of a word, and that something is always a distorted feeling (“stored” perception).
How exactly it happens that I arguably almost always come up just right words, and arguably produce less bullshit? This is because I have apparently a “closer”, less distorted representation of really.
Now lets see what the Lambda Calculus (the minimal necessary and sufficient formalism to express everything) consisted of.
- bindings, which imply an environment and nesting.
- application, which require a set of rules and environment
- lambdas, which capture the abstraction principle of the Mind.
There is another fundamental building block, which is usually ignored, - partitions, which corresponds to the types, and ultimately to entities of the same structure, or having of the same properties.
And THIS is exactly what that Curry Howard isomorphism is all about - the two seemingly independent branches converge back to What Is (logic require the notions of partitions and environments because these are universal).
And this is all we need. Bindings are dots, applications are arrows and lambdas are closures - “captured patterns” of the mind.
From these stable building blocks, as been demonstrated a few times (most recently by Wadler in his Agda book) all the mathematics can be built (constructed).
This is not an arbitrary result. The significance is that everything the Mind does is reducible to “maintaining” and “using” of an inner representation of reality, which itself require an environment with storage and retrieval facilities. (again, we are abstracting everything away - using conceptual lambdas).
Thus we are removing the distorting veil by improving our inner representation (the weights) to match the reality perfectly. And yes, all NNs are just instances of this universal pattern discovered by the Upanishadic seers using the only tool available at the time - introspection, which is “looking inside, observing the observer - the Mind itself”.
In Mathematics we study abstract patterns (abstracting away any possible representation or actual implementation of an algorithm or a structure. Algorithms are procedures and a computer is a person.
In CS every value implies some representation and it is usually a part of some Abstract Data Type, which, in turn, implies some (usually there is more than one) actual implementation, based on some particular data-structure.
Math is untyped, too abstract and actually easy.
NNs
Now lets take a ride on a conceptual roller-coster. What is a Neural Network. It is a structure, in the first place, just like molecular structures inside your brain. Everything in the Universe has a structure.
NNs are abstract mathematical structures, represented as “concrete” syntax trees. Each node is a pure mathematical function (transformation), not just arbitrary function, but of a particular kind - a differentiable function (transformation), paired to some input values, which are called “weights”.
Ultimately, each node is a transformation of values (scaling), a “join” (a weighted sum), or a “rearrangement” - a structural transformation. No information is being lost or discarded, only transformed.
This is exactly why and how a Neural Network, after successful training, could, in principle, approximate any function. It also implies, if you think for a minute, that only “stable environments”, in which nothing being over-written, broken or destroyed behind your back, can be approximated.
The stability property (of a shared environment) it turns out, is a universal property, which is required not just for a Mind, but for Life Itself. Otherwise there will be a fucking mess and inevitable suffering. And there it is.
All what is required is the principal ability to “learn by experience”, which is implemented as ability to “reach back” trough the syntax tree (of a computation) and adjust (refine) the weights for each new experience. Thus the whole abstract tstrucure (which defines and represents a computation) literally converges to What Is.
We alter the inner representation, which is an ultimate principle, just like what the brain does. Brain has lots of evolved protective mechanism, which basically filters out the “noise” and occasional delusions due to errors of perception.
What is required to actually represent these abstract structures? An environment, with “Dots and Arrows” in it. What is required to implement the arrows? “Dots and Arrows”. What is required to implement dots and arrows? Nested environments and representation for dots and arrows, plus an abstract machine - the Evaluator (do you remember these MIT guys wearing funny hats?).
An implementation of Simple Typed Lambda Calculus is enough for running Neural Nets, and the necessary and sufficient building blocks are the same as of the Lambda Calculus itself. - immutable bindings, pure transformations, partitioned environments.
All you need is Lambda, they said. This is an ultimate result.
Math
Function
A function \(f ∶ A \rightarrow B\) from a Set \(A\) (the domain) to a Set \(B\) (the codomain or a range) is a rule \(f\) (defined formally) so that \(\forall x \in A, \exists! y \in B \mid f(x) = y\).
We call \(f(x)\) the image of \(x\) (in \(B\)).
One-to-one
The function \(f\) is one-to-one (or an injection) if and only if \(\forall x_1, x_2 \in A, f(x_1) = f(x_2) \Rightarrow x_1 = x_2\).
Surjection
The function \(f\) is onto (or a surjection) if and only if \(\forall x \in A \exists y \in B \mid f(y) = x\)
Bijection
A function is a bigection if and only if it is both one-to-one and onto.
Composition
If \(f: A \rightarrow B \land g: B \rightarrow C\) are functions, their composition \(g \circ f : A \rightarrow C\) is given by \[g \circ f(x) = g(f(x))\].
Nesting is the natural and the only way to define an order of evaluation for a composition of two functions.
Equality
Two functions \(f : A \rightarrow B\) and \(g : A \rightarrow B\) are equal if and only if \[\forall x \in A, f(x) = g(x)\].
Algebraic System
A Set \(S\) together with an operation \(\bigoplus\) is an algebraic system denoted as \((S, \bigoplus)\) (just an abstract tuple).
Operation
A Set \(S\) has a binary operation \(\bigoplus\) if and only if \[\forall x, y \in S \exists! z \in S \mid x \bigoplus y = z\]
Anything that satisfies the conditions is an operation.
Closure
For a system \((S, \bigoplus)\), a subset \(T\) of \(S\) is closed under \(\bigoplus\) if and only if \[\forall t, u \in T, t \bigoplus u \in T\] That is, \(\bigoplus\) is an operation for \(T\). Identity is not required.
Commutative
An operation \(\bigoplus\) is commutative on \(S\) if and only if \[\forall a, b \in S, a \bigoplus b = b \bigoplus a\]
This is a property of an operation itself. The meaning is that the order of arguments does not matter.
Associative
An operation \(\bigoplus\) on Set \(S\) is associative if and only if \(\forall a, b, c \in S\) we have \[(a \bigoplus b) \bigoplus c = a \bigoplus (b \bigoplus c)\]
This is a property of a composition of two or more of the same
operations. The meaning is that the order (which will be the first)
does not matter, due of the Closure
property.
Identity is a separate notion and is not required.
Distributive
An operation \(\bigotimes\) on \(S\) distributes over \(+\) on \(S\) if and only if \[\forall a, b, c \in S, a \bigotimes (b + c) = (a \bigotimes b) + (a \bigotimes c) \land (b + c) \bigotimes a = (b \bigotimes a) + (c \bigotimes a)\]
This is a property of a pair of operations.
This is not just scaling of a sum. Logical connectives distribute over each other.
FOIL
Suppose in \((S, +, *)\) that \(+\) is associative and \(*\) /distributes over \(+\) and \(a, b, c, d \in S\) Then \[(a + b) * (c + d) = (a * c) + (a * d) + (b * c) + (b * d)\]
This is a lemma that follows for the distributive law.
Identity
A system \((S, \bigoplus)\) has an indentity element \(e \in S\) if and only if \[\forall x \in S, x \bigoplus e = x = e \bigoplus x\]
Each binary operation may or may not have its own identity.
If \((S, \bigoplus)\) has an identity, the identity is unique.
Monoid
\((S, \bigoplus, e)\) such that \(\forall x, y, z \in S\) identity and associativity laws hold: \[x \bigoplus e = x = e \bigoplus x\] \[x \bigoplus (y \bigoplus z) = (x \bigoplus y) \bigoplus z\] Commutativity is not required for a Monoid, however \(+\) and \(*\) are commutative. \[\]
Inverse
For \((S, \bigoplus, e)\) with identity \(e\), an inverse of \(x \in S\) is some \(y \in S\) such that: \[x \bigoplus y = e = y \bigoplus x\].
This implies that \(\bigoplus\) is commutative, e.g. \(a \bigoplus b = b \bigoplus a\).
For example, \(a - b = a + (-b)\) where \(+\) is commutative and \(-b\) is an additive inverse of \(b\).
Uniqueness of an inverse
If \((S, \bigoplus, e)\) is associative and has an identity \(e\) and an element \(x\) has an inverse, then the inverse is unique.
inverse distibutes over \(\bigoplus\)
- symmetry: \((x^{-1})^{-1} = x\)
- distributive law: \((x \bigoplus y)^{-1} = x^{-1} \bigoplus y^{-1}\)
Group
Associativiy, Identity, Inverses \(\forall x\).
A system \((G, \bigoplus, e)\) is a Group if and only if \(\bigoplus\) is associative, has an identity \(e\), and every element has an inverse.
Commutativity is not required.
Ring
A Set with two operations \((S, +, *)\) is a ring if and only if \((S, +)\) is an abelian group with identity \(0\), \(*\) is associative, and \(*\) distributes over \(+\).
Albelian group means that both operations are commutative.
Field
A ring \((F, +, *)\) is a field if and only if \(F\) is commutative, has a multiplicative identity and \(\forall x ∈ F | x \ne 0\), \(x\) has a multiplicative inverse
Number systems
- \(\mathbb{N}\) has no inverses, \(+\) and \(*\) form a Monoid.
- \(\mathbb{Z}\) has only additive inverses (it must be \(\forall x\))
- \(\mathbb{Q}\) has additive and multiplicative inverses (\(a \div b = a * \frac{1}{b}\))
- \(\mathbb{R}\) has it all
Semi-group
A Group without inverses.
Functor
Originally defined on graphs as a transformation that preserves the structure of a graph.
Traditionally in mathematics graphs are defined as a tuple of Sets - \(N\) as a Set of Nodes and \(E\) as a Set of Edges, where each Edge is itself a tuple which contains a pair of Nodes (so it is ordered).
Structure-preserving mappings are called homomorphisms (of the same form).
For simple graphs h
is a total function \(h: (N, E) \rightarrow (N',
E')\) such that:
\[\forall (A, B) \in E, \exists (h(A), h(B)) \in E\]
Homomorphisms are structure-preserving as they map Edges to Edges
(Nodes are preserved implicitly).
There is another, more precise and detailed formulation of a Functor
for labeled graphs which is irrelevant.
A Functor
is just a generalized homomorphism, that is a
structure-preserving transformation, whatever it might be.
Category theory
Here the notion of a Functor
has been generalized as a mapping between
Categories which are abstract algebraic structures of objects and
arrows.
When it maps arrows associativity and identity has to be preserved (these are properties af an algebraic “structure”): \[\forall f: A \rightarrow B \exists! Ff: FA \rightarrow FB \land \forall f: A \rightarrow B, g: B \rightarrow C \mid F(f \circ g) = Fg \circ Ff\]
Constraints on Sets
We use set comprehension notation to specify constraints on subsets (using guards, which are predicates). \[\{ x \mid x \leftarrow S, x \geq 0\}\]
Defining subsets is partitioning on a property or a constraint.
Intersections of such partitions (subsets) is where all the “typing” is.
CS
Types are Sets
Semi-group
No notion of an inverse. No notion of an identity.
class Semigroup a where (<>) :: a -> a -> a GHC.Base.sconcat :: GHC.Base.NonEmpty a -> a -- + GHC.Base.stimes :: Integral b => b -> a -> a -- * {-# MINIMAL (<>) #-}
Monoid
Here a Monoid
is a (subclass of a) Semigroup
class Semigroup a => Monoid a where mempty :: a -- an identity mappend :: a -> a -> a -- an assosiative operation mconcat :: [a] -> a {-# MINIMAL mempty #-}
- List concatenation (
++
) - Logical conjunction (\(\land\)) and disjunction (\(\lor\))
Functor
Generalized notion of mapping over a value which preserves the structure (whatever it may be).
Generalized mapping on Lists
dates back to the early Lisps.
(define map (lambda (f xs) (if (null? xs) xs (cons (f (car xs)) (map f (cdr xs))))))
To be a Functor
(for a type) is to have the fmap
function implemented.
class Functor f where fmap :: (a -> b) -> f a -> f b
No additional constraints, except the arity of a function which fmap
takes.
Ocaml modules
Application of one structure to another (yielding a new structure of the same shape).
It is used for specialization and for extension of existent modules.
Constraint on Types
We define classes of parameterized types (similar to Sets of Sets with constraints (Boolean guards) and the subset relation).
Type-classes
Duck-typing - to be an X
is to implement a particular interface (a set of names
with certain type-signatures) subject to formally defined
constraints (and invariants) , stated elsewhere.
With a Class-subclass relations (type-level constraints) - in order to
be (an instance of) an X
it is required to (already) be (an instance
of) an Y
.
Eq (Equality)
class Eq a where (==), (/=) :: a -> a -> Bool x /= y = not (x == y) -- default implementation
Ord (Ordered)
class Eq a => Ord a where (<), (<=), (>), (>=) :: a -> a -> Bool min, max :: a -> a -> a min x y | x <= y = x | otherwise = y max x y | x <= y = y | otherwise = x
Num (Numbers)
This corresponds to a Group with 3 operations.
-
is not a Monoid (not associative - order is relevant)
class Num a where (+), (-), (*) :: a -> a -> a negate, abs, signum :: a -> a
Integral (Z - Integers)
A constraint on Num
(a specialization, a subset)
class Num a => Integral a where div, mod :: a -> a -> a
Fractional (R - Rational)
class Num a => Fractional a where (/) :: a -> a -> a recip :: a -> a recip n = 1/n
Bool (Boolean values)
data Bool = False | True deriving (Eq, Ord, Show, Read)
Classic Logic
Logical conjunction:
(&&) :: Bool -> Bool -> Bool False && _ = False True && b = b
Logical disjunction:
(||) :: Bool -> Bool -> Bool False || b = b True || _ = True
Logical negation:
not :: Bool -> Bool not True = False not False = True
Guard that always succeeds:
otherwise :: Bool otherwise = True