GPT Bullshit all over again
That @karpathy
bullshitter is trending on HN again, which is not merely annoing,
but discredits everything what is good and true about science and true
philosophy.
Lets collapse some bullshit for demonstration and teaching purposes, and we will collapse the whole fucking cathedral of bullshit by pointing at flaws in the foundation.
To accomplish this we have to use lots of metaphors as generalizations and references to different fields, which, of course, unlike abstract models, converge to What Is.
There are thousands of years of mathematics philosophy of Mind in the East and logic in the West, and some results they have produced are valid captured generalizations, which cannot be just dismissed or brushed under the rug..
What is the mind and language in principle (what questions!). Well, operationally, the mind is an observer witch continuously build and maintain an inner representing of its environment and uses this representation for taking actions.
This is, of course, the generalization from the studies of intelligent agents. What is important, however, that such generalizations are not wrong, just too general. They abstract away the specifics and actual “implementation”.
These inner representations are not being built each time from scratch, on the contrary, the structural template of brain regions has been evolved for millions of years and its reflects the constraints of the environment in which it happen to evolve.
Again, the structure is not arbitrary it converges gradually to match What Is out there.
A language began as a medium of communication, and it was an encoding for signaling and transmission between individuals. Gradually, it became the medium of a reasoning a “storage” of a shared culture, which is emerged due to use of a common language.
Notice that neither the inner representation, not the language itself is “knowledge”. It is a heavy overloaded world.
Early NLP guys came up with a simple model of communication, where parts of the inner representation are encoded for verbal (oral or written) communication (due to some neurochemical spikes). The communication is usually augmented by non-verbal emotional cues - volume, pitch, gestures, and it can be emotionally charged by using some common idioms (offensive or even obscene).
Again, there are valid generalizations, they are not wrong, they has been “evolved”. These are fragments of a partially solved Jigsaw puzzle, a multidimetional one, which can be partially solved at different scales. (this is another metaphor, of course).
Solving a Jigsaw puzzle is the best known metaphor about the true knowledge acquisition. There is an actual environment in which we are, and that environment is in turn a particular locality in what we call the Universe, which in turn, is a partially observed by the Mind manifestation of What Is.
The ability to zoom in and out through these levels of generalizations and valid abstractions (acquired as “knowledge”) is what constitutes an intelligence. An intelligent agent, again, is a process that acts according to its internal representation of the environment. The less distortion - the better for it.
Mathematics is a set of valid generalizations and useful abstractions made by the mind of an external observer, who observers (and generalizes from) the actual patterns in What Is. The “laws” of arithmetic are the observed properties of What Is. The addition is generalization of putting together (being in the same locality) and so on.
To do mathematics the Mind does “classification” (pattern-recognition and labeling and grouping). Arguably pattern-recognition, grouping and identification (dogs know each other) is an animal mind, while labeling is an activity which requires the language centers.
What we call logic just arises naturally. It is just a discipline of a correct labeling and of using a human language unambiguously. The best thinkers of all times wrestled with imperfections of a language and of the Mind. They got some important results.
The must fundamental is that just one single flawed statement or a wrong premise collapses everything that follows (based on) it.
This is what any language model lacks in principle - the actual validation of all the previous steps (or structures).
The process of construction of a representation must be different (and it is well-known) - each change must trigger a verification process of the whole structure, and everything “above a flaw” has to be demolished (backtracking).
Now, @karpathy
, fanboys and believers, pay attention - the back-propagation process
never eliminates wrong structures. It may diminish the weights, but it does not
collapse the structure as it should. Collapsing, demolision is required.
The problem is in terminology. What you call “validation” is not validation, what you call “backprob” is not actual backtracking to the last valid state (in which partial Jigsaw puzzle is valid and verified). What you have is not a “knowledge representation” it is not even “knowledge”. It is information.
Information is not knowledge. Pay attention here. This result is as old as humanity itself.
What you manipulate and transform is information, not knowledge. Knowledge is prior to information about it. Knowledge extraction requires validation and verification of the whole at each step. There is no other way or shortcuts, in principle.
Knowledge is What Is, and the information is mostly bullshit. Here is how.
Recall when you have joking with Friedman about prompting each other. So clever, top “data scientists” are joking.
But that prompting is a synonym for a bullshit retrieval and has nothing to do with knowledge. It is information, encoded for communication, based on a inner representation of some individual. Giving the social and cultural forces which shape this representation, it is guaranteed to be bullshit.
Having a less wrong representation, leave alone better solved Jigsaw, is extremely rare, as rare as the Buddha. Some other metaphors are required to explain.
You could “prompt” an African shaman or, even better - a Western “scholar” of a Tibetan tantric tradition (almost entirely made up by Western scholars), and retrieve what seems to be a “knowledge”. This “seems to be” is the key to understanding of what is going on really.
Imagine a rubble of an ancient city. This is what your actual “representation” looks like due to continuous intense bombardment with outside bullshit. The whole atmosphere (with its rains and tornadoes) of bullshit.
When you train a GPT model on camera, you are capturing the shape of that constantly chaining rubble, assuming you are not losing the information (which you do by averaging). But it is not a rubble of a single individual’s bullshit, it is a snapshot of written bullshit of millions of individuals, put through a shreder.
Yes, splitting into ngrams is exactly putting the texts throng a shreder before reading them. This is a valid metaphor. Why? Because knowledge is not encoded at this level. It is not even at the level of nouns and verbs. It is not at the level of a particular language. Any of 6000 of human languages can be used to encode a verbalized metaphor more or less wrong.
The knowledge is at the level prior to any language.
Now what is that you retrieve. You retrieve structures (information) which look like (to you and the others) as a “knowledge”. It passes tests of being well-formed sentences, and some sematical heuristics (seems legit LOL).
The problem is that this is definition of a sectarian bullshit, conspiracy and abstract theories, religious beliefs, and any other kind of a familar bullshit.
Do you remember what mathematicians, logicians and early scientists of the past were tried to do? They tried to discard and prune out bullshit. What you do is producing and multiplying bullshit. Exactly, literally, in principle, by definition. Mere words is not a knowledge.
By now even an ordinary reader would see why.
Mere observations and descriptive statistics is not enough to understand and to know the underlying causal relations. Controlled experiments and reduction of the results to what is already known (non-contradiction) - to the partially solved Jigsaw puzzle is the process of knowledge extractions.
It is the process of removing the bullshit to see what remains. You are doing the opposite with your models.
Now what is the whole process than? Well, the usual shamanistic rithuals to impress the public. Cosplay of intelligence, cosplay of being super smart, cosplay of being way above average, instead of actually being less wrong.