UP | HOME

There ARE facts.

Ultimately, all the mathematical techniques being used are of a “fitting of a curve”, or of a “hyperplane”, as they say, or, more generally - of a repeated (recurring) pattern recognition, using a form of abstract /structural pattern-matching.

The pattern has to be “real” or existent (not imaginary) and to be actually presented in the training set.

Given a real “shape” we “learn” an abstract mathematical structure which matches that shape “perfectly”, using the “backprop” (the systematic changing of the abstract mathematical structure until it matches good-enough, according to some metric).

Both “classification” and “pattern recognition” are the instances of this generalized notion of “supervised learning”, which means that the examples are correctly labeled with the “ground truth” values.

In this Universe the actual physical constraints of the shared environment corresponds to “truth labeling” and are the ultimate causes of all the recurring patterns, from the atomic level all the way up to human intelligence.

To recognize a recurring pattern it has to be out there. Given nonsense, a network would “learn” to compute the nonsense.

Expressions in a graph

An acyclic graph without “forks” (only “joins”) is a universal generalized pattern, which, ultimately, represent the causality, according to the fundamental Causality Principle (there is nothing random in the Universe).

The “forks”, by the way, represent differnt potential possibilities and are diffident abstract notions on a different level of abstraction. Causality have no notion of Potentiality, but potential outcomes, like everything else, depends on the possible Causes.

Mathematical expressions, which approximate pure mathematical functions (usually returning probabilities, which must already raise an alarm).

Notably, a neural network can be a “learned implementation” for a pure mathematical function (same inputs - same outputs, always).

The “learning” is accomplished by adjustment of the “weights” (coefficients) using the mathematical framework of partial derivatives.

In theory, a sufficiently large and properly-structured network could, in principle, “learn” to approximate any pure function, provided it has been given enough of correctly labeled (the groud truth) training examples.

This very mathematical notion is behind the “without being explicitly programmed” meme.

The shape or “topology”

The shape or an “architecture” of a network matters, especially when we try to use the universal notion of “feeding-forward” and “feed-back loops”.

They correspond to cycles on a graph, so it is not acyclic anymore.

However, the crucial property can be “restored” by hiding (encapsulating) all the “cycles” or “loops” within (inside) the nodes, so the whole graph is again “joins only”.

It has to be directional just like the Causality itself (everything “unfolds”). The abstract notion of an arrow which captures a direction is, again, universal and not accidental.

This, of course, corresponds to the emergent pattern in algorithmic flow-charting - one has to have “linearity” to have a proper composition, so, both “conditionals” and “loops” have a single value at its “output”.

The graph-reduction technique for implementation of pure-functional languages got this universal notion just right.

So, we could feed the results back and forward, as long is it inside a composeable “black box” abstraction (expression).

Statistical likelihood is what it captures

When one says that a neural network “predicts” something, it is already bullshit.

Conditional probabilities do not “predictions”, they are measures of repeated observations. Any generalizations from these measures are arbitrary.

Statistics in general is not a semantic knowledge, by definition. Period. And no, our minds are not thinking “out of statistics”. Or minds use “structural networks” with represent “semantics”, not Numbers to represent probabilities.

This is how any current NNs are fundamentally different from what brain does. The brain is structural, not numerical. Mother Nature does not count.

And no, sampling of a probability distribution does not constitute any “semantic knowledge” or any form of intelligence, except of being systematically (and measurably) less wrong in some particular metric than a pure “random guess”.

This is approximately how our perception systems has been evolved, but perception alone, even pattern-recognition alone is not in itself an intelligence.

Perception, not intelligence.

Neural networks are [tools of] recognition and perception, not intelligence.

Author: <schiptsov@gmail.com>

Email: lngnmn2@yahoo.com

Created: 2023-08-08 Tue 18:38

Emacs 29.1.50 (Org mode 9.7-pre)