Large language models can score surprisingly well on some IQ-style item sets, with reported figures around 120-135 on certain tests and even higher on some pattern benchmarks. But these numbers reflect training data and pattern matching, not human general intelligence (g), and IQ norms were never designed for machines, so any single 'AI IQ' figure deserves heavy caveats.
On some tests, yes, an AI can produce a high score, but with major caveats. Modern LLMs do well on verbal analogies and certain matrix items, and reported scores have landed around 120-135 on specific tests. However, results vary wildly by test and version, and a 'pass' on selected items is not the same as possessing human-style intelligence.
It mainly measures how well the model pattern-matches against its training data on that particular item format. IQ tests assume a human test-taker with limited working memory, finite processing speed, and no prior exposure to the exact items. An AI violates all of those assumptions, so its score reflects statistical learning, not the underlying construct (g) that IQ tests were built to estimate in people.
Because IQ norms are calibrated entirely on human populations, with a mean of 100 and standard deviation of 15. An AI has no developmental history, no biological constraints on memory or speed, and may have effectively seen similar problems before. Placing a machine on a human bell curve is a category error, even when the arithmetic produces a number.
Because performance is highly sensitive to item format, how the question is phrased, and what the model was trained on. The same system can look brilliant on text-based verbal reasoning yet stumble on novel visual or spatial puzzles that humans find easy. This instability is itself a clue that the score is measuring narrow skill, not a stable, general capacity.
No, those are speculation, not measurement, and should be labeled as such. Projections of a single rising 'AI IQ' number treat a shaky, test-dependent metric as if it were a fixed human trait on a predictable trajectory. Treat any specific future 'AI IQ' figure as a forecast or marketing claim, not an established fact.
| Test / benchmark | Reported AI result | What it actually measures |
|---|---|---|
| Verbal analogy items | High; often human-superior | Pattern matching over vast text training data |
| Progressive matrices (some sets) | Reported scores ~120-135 | Visual pattern recognition on familiar formats |
| Novel / unusual reasoning items | Inconsistent, sometimes poor | Brittleness when problems fall outside training |
| Caveat: any single 'AI IQ' | Varies wildly by test | Not human general intelligence (g); norms are human-only |