UP | HOME

Linear Algebra for AI

This is the subject which has implicit structural typing, which is a sort of a structural dynamic-typing (instead of having explicit type-tags).

The structure of a single column-vector of can be interpreted as a representation of various concepts.

The issue here is that, while the rules of the linear algebra prevent us from applying operations to vectors of non-matching dimensions (dynamic typing) and enforce component-wise operations and isolation of the “coordinates”, it does not prevent from mixing and matching conceptually unrelated representations.

Some “abstract theoreticians” pay no attention to the abstraction barriers, which must clearly separate the concepts into distinct partitions. Adding a “point” to a “vector” is a bug.

Every too abstract theory is “possible” only because of these type- (abstraction violation) errors unnoticed.

Weighted sums at the core.

A linear combination is both scaling and addition. So the result is a weighted sum. \[cv+dw\] Either or both \(c\) and \(d\) could be \(0\) or \(1\) as the special cases.

A polynomial is the special case of a weighted sum.

Matrices may represent coefficients of a polynomial or all the coefficients of a system of linear of equations.

The “map” function

Notice that a matrix-vector multiplication (dot-product) is semantically equivalent to applying the “map function”, parameterizes by the matrix to a vector as a singleton list.

Matrix-matrix multiplication (dot-produt) is (conceptually) applying the (first) matrix to a “sequence of rows” (of the second matrix).

The rules of a linear combination define the “implicit body” of the “map function”.

Thus it can be properly generalized to an abstract notion of a Functor.

This generalization has important consequences in viewing certain aspects of Linear Algebra as generalized uniform transformations.

Preliminary abstractions

Here we will talk about captured (by the mind of an external observer) and properly generalized abstractions, (superimposed by the mind onto its perceptions).

  • A distance (the universal notion, measured in arbitrary “units”)
  • A straight line (the shortest distance between two dots, extended into “infinity”)
  • A projection (the shadow of a stick, literally)
  • A perpendicular (a vertical plant, “no shadow”)
  • An arbitrary angle (knowing the length of the stick and its shadow)

Compound abstractions

  • The Origin (I am, zero distance)
  • The Number Line (of arbitrary but evenly-spaced dots)
  • The Right Triangle (with a right angle and the unique properties)
  • The Cartesian plane (of two perpendicular number lines or axis)
  • Projection (a perpendicular to the axis, and the distance from the Origin)
  • The Three Dimension space (three mutually-perpendicular planes)

    Notice that we cannot go beyond 3 without losing the fundamental notion of orthogonality (of being mutually perpendicular).

This notion is real, it is due to it any motion is “independent in each direction”, or more precisely, anything moves independently in each (of 3 and only 3) “dimension”.

No, time is a derived abstraction and is independent but NOT /orthogonal to all 3 others. They must be clearly separated by an abstraction barrier, due to being at different levels (and different kinds) of abstraction.

The ability of the mind to just add (superimpose) a time-axis, and project onto it does not imply that this is how Things Are. This is an abstract, mind-only methodology.

Projections to an abstract evenly-spaced axis may be justified, but the orthogonality is only an assumption.

The Number line

The two required properties

  • a straight line (the shortest distance and projections to infinity)
  • evenly spaced “dots” of arbitrary (but the same) distance
  • abstract imaginary operations of stretching or shrinking uniformly (every distance remains equal to all other)

The origin

A distance from the origin (how far away) A unit (the distance of exactly one-more-than-zero) “Count the dots” (name the dots to get a Number). A “Number of dots from the Origin” (the distance in units).

A perpendicular and the right angle

The origin (can be placed arbitrary) Perpendicular or orthogonal axis - (no shadow, zero length projection) Two axis “sweep” a perfectly flat abstract plane Coordinates (distances from the origin along the axis) Vectors, (which originate in the Origin) A unit vector (a pair of coordinates, with implicit \(0, 0\) of the Origin) along some axis.

Vectors

Both a row-vector and a column-vector contain exactly the same information, and both denote is an ordered sequence.

The choice of writing as a row or as a column is arbitrary and the operation of “rotation” (or transposition) has be defined on vectors. The order is preserved.

People would say that a vector is only a column-vector, and a row-vector is a 1-by-n /matrix, so the rules of matrix-vector multiplication apply.

The subtlety is that the dot-product operation for two vectors are commutative, while the matrix-vector multiplication in general is not.

When we begin to “stake” individual vectors into matrices, the distinction becomes important due to how operations on matrices are defined.

Linear transformation

Uniform transformations, where each distance remains equal to every other. With rotations nothing changes, with transformations the change is uniform (for all dots).

a dot-product of unit vectors is zero.

This is because “the other coordinates” in a unit vector are exactly zeros (meaning no projections, which is the notion of being a perpendicular).

The pair-wise multiplied weighted sum therefore is \(0\).

This abstract numerical notion can be numerically generalized beyond 3 dimensions, but the connection to reality is lost.

Based on this result they postulate that whenever the dot-product of vectors is zero, they are perpendicular to each other.

This is, of course, bullshit.

But we have hyper-planes, hyper-cubes and hyper-spheres to “worship” in sects.

This is exactly what it means to have a too-general (or too abstract) abstraction. Just numerical operations (and their application) may not correspond to anything real (What Is). This is precisely what is wrong with modern model and statistics-based “science”.

Patterns

There is a subtle universal pattern - a weighted sum, which, it seem, is underlying everything.

Packaging into vectors is more general than just Cartesian Plane or a 3D space.

This is where abstraction barriers or the notion of typing has to be employed. The operations are too general and the meaning is easily lost.

We have \(2\) fundamental operations

  • \(+\), in the most universal sense of adding together
  • \(*\), which is not times but scaling or weighting

Addition is “for the notions of distances and lengths”, Scaling is for “the notions of angles and projections”.

Geometric and Numerical interpretations

The abstract numeric line

An imaginary abstraction of a straight line composed of evenly-spaced (same distance or the same length of the intervals between) dots, superimposed by the mind as a “scale” - an abstract counting and measuring “device”. An abstracted out “stick with notches”, of course.

  • distances and lengths

    A vector has length (or a magnitude) and a direction or an angle.

At the level of vectors and matrices (which are “stacked” vectors)

  • \(\cdot\), a dot-product, which is a positional-wise scaled weighted sum
  • \(\times\),a cross-product, which is an “application” of one vector to another

And for vectors we have a universal linear structure, in the sense that there has to be the same number of “positions” and they have to be in the same order, and, to have any meaning, must denote the same “dimensions”.

All the operations on vectors (and matrices) are positional, so the dimensions must match up.

dot-product

A dot-product, \(v1 \cdot v2\) is just position-wise multiplication (scaling) and then adding everything up.

\(\begin{bmatrix} 1 \\ 2 \end{bmatrix} \cdot \begin{bmatrix} 3 \\ 4 \end{bmatrix} = 1 \times 3 + 2 \times 4\)

The meaning of the whole operation is scaling of one vector by another. It is commutative, because both scalar-level operations involved are commutative, and, intuitively, due to the symmetry.

The geometric meaning is a projection of two vectors onto a number line (that goes through one of them), and denotes the distance form the Origin.

It does not matter which vector is being projected onto which, the resulting (scaled) distance will be the same.

The distance (the length of a projection, which is a not a vector, but a scalar) will be \(0\) for perpendicular vectors. This is why we need a strict type-discipline.

Numerically, this operation is a dual to matrix-vector multiplication - the same pair-wise weighted sum.

\(\begin{bmatrix} y_{1} & y_{2} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}\)

cross-product

A cross-product \(A \times B\) is an application of one vector to another, which is a fundamentally different operation, and, just like function composition \(\circ\) is not commutative.

Projection to a lower dimension

It can be shown that any point in 2 coordinates can be projected to an arbitrary chosen line (going through the Origin) and any point in 3 coordinates onto an arbitrary chosen plane (and, in turn, onto a line).

However, these operations are irreversible and imply a loss of information.

Generalization to the “higher dimensions” is only numerically justified, again, on the same abstract assumptions as the dot-product is zero means orthogonal.

Matrices

The tradition says that vectors are columns, or only column-vectors are vectors. What we call a row-vector is, according to tradition, an 1-by-n matrix. When a matrix has \(m\) rows, each column is a vector in a m-dimensional space.

Linear combination of column-vectors.

The matrix-vector multiplication - \(Ax\) (scaling). \(\begin{mathbf}A\end{mathbf} = \begin{bmatrix} y_{1} & y_{2} & \ldots & y_{n}\end{bmatrix}, \begin{mathbf}x\end{mathbf} = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n}\end{bmatrix}\)

\(Ax\) is a combination of the columns of \(A\).

The equation \(Ax = b\) is of a combination that produces \(b\). The solution is the vector \(x\).

  • the direct solution: forward elimination and back substitution algorithm.
  • the matrix solution: \(x = A^{-1}b\) (if \(A\) has an inverse).

General Matrix-vector

Dot-product (an inner-product) \[\begin{bmatrix} y_{1} & y_{2} & \cdots & y_{n} \end{bmatrix} \cdot \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} = \begin{bmatrix}y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} \cdot \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix}\] Notice how there is only the one way to “transpose” an 1-by-n matrix into a column-vector. \[\begin{bmatrix} a_{11 }& a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23}\end{bmatrix} \cdot \begin{bmatrix} x_{1} \\ x_{2} \\ x_{}_{3 }\end{bmatrix} = \begin{bmatrix} DP_{1} \\ DP_{2} \end{bmatrix}\] So we are scaling the matrix \(A\) (by \(x\)) and collecting weighted sums into a new column-vector (\(b\)).

Matrix-matrix

Each additional column (to the second matrix) produces a new column in the result. \[\begin{bmatrix} a_{1 }& a_{2} & a_{3}\end{bmatrix} \cdot \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32}\end{bmatrix} = \begin{bmatrix} DP_{1} & DP_{2}\end{bmatrix}\] Notice that this operation is not commutative and column-oriented. \[\begin{bmatrix} a_{11 }& a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23}\end{bmatrix} \cdot \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \\ x_{31} & x_{32}\end{bmatrix} = \begin{bmatrix} DP_{11} & DP_{12}\\ DP_{21} & DP_{22}\end{bmatrix}\] So, again, we are scaling position-wise by columns and collecting weighted sums into a new “collection of columns”.

The one-hot encoding

Create a fixed-size vocabulary - soort words in any order. A vocabulari-lenght vector with all zeroes and \(1\) at the “index” of the word (into a vocabulary).

Properties:

  • dot-product with itself is \(1\)
  • dot-product with any other word is \(0\)

    This “means” (an abstract assumption) that each word representation is perpendicular to each other in a “hyper-cube”.

    This, of course, is a socially-constructed bullshit. Chuds swim in it. The numerical properties, however, are valid and useful.

The “Positional encoding”

The notion of a distance after projection is what underlies the “positional encoding” of language model.

Changing the distances conceptually “clusters” the words in a multi-dimensional hyper-space.

In practice it is a form of indexing, conceptually related to sorting (ordering).

Author: <schiptsov@gmail.com>

Email: lngnmn2@yahoo.com

Created: 2023-08-08 Tue 18:38

Emacs 29.1.50 (Org mode 9.7-pre)