Bullshit, bullshit everywhere
This turned out into a real bullshitfest of Chuds.
Example: https://txt.cohere.ai/what-are-transformer-models/
“Transformers, on the other hand, keep track of the context of what is being written, and this is why the text that they write makes sense.”
Bullshit. Plain and simple.
The “context”, as a word without clarification (without specifying a context!), is a pure abstraction, too general to be used meaningfully.
There is, however, a semantic context, cultural context, social context and even a context of a particular language usage.
A model “is not aware” or “keeps track” of any of these. All models work on the level of frequencies (and “emerging directed graphs” – the actual “structures” being used after all the gradients being set) of bit patterns, not even letters, vovels or morphemes.
What they call “attention” in the “standard terminology” has nothing to do with moving of a perception or selectivity. It is “reshaping” of the “resulting structure”, by updating the weights, based on how frequently the words appear next or close to each other.
This has nothing to do with the meaning, it is only a form of clustering, which is, in principle, similar to a bucket-sorting, but not a semantic categorisation.
This is not just a metaphor. From a seemingly uniform, fully-connected strucure some frequently used “trails” emerge as result of backprops.
Something similar in principle hapens in the brain, which does “pruning” in the process of neuro-maturaion, and this is how a child learns.
The trails on the snow (or in the mountins) is the best metaphor.
What is “keeps track of” is the “state”, which is being passed along, but this state is not a synonim of a context as we know it in linguistics or semantics. It is just a numeric (in a shape of a matrix) aggregate.
One more time. There is only seemingly emergent behaviors of the models and apparent intelligence.
Just as an uneducated peasant cannot tell intelligent, science-based talk from a religious bullshit, most of people cannot call bullsit on model’s output. Only real experts in their fields can do.
A model does not perform any inferences, deductions or inductions or anything at all. It just traverses its resulting (from training) structure.
To parapharse good philosophers - there is nothing in its output that was not in its training.
Pseudo-randomly selecting among possible “next words” does not produce a “new meaning”. It just produce a parrot babbling, or, literally, “talking bullshit”.
Another Chud: https://apenwarr.ca/log/20230415
“Behold, /emerging complexity, game-of-life, Wolfram’s pseudo-science’, I am so fucking smart and know all the current memes!”
A human language is not the medium of creating emerging complexities, it is actually the opposite. It is the medium for oral (and in a written form) transmission of descriptions of particular aspects of the shared environment between humans.
Any person with classic education in math, philosophy, logic and writing (just like me, LOL) will tell you that.
We are not interested in emerging verbalized bullshit. We already have whole Himalayas of it in a written form and on the social media.