Meta-Analysis: How AI Predictions from 2017 Hold Up Today

Published on 2025-03-27 by Mohd

Revisiting "AI is not going to kill you!"

In 2017, I wrote a post titled "AI is not going to kill you!" where I made several key assertions about AI and its trajectory. Eight years later, it's time to critically examine those claims against what's actually happened in AI. Some of my assertions have held up remarkably well, while others have been thoroughly demolished by recent developments.

What I Got Right

My central thesis in 2017 remains valid: there is a fundamental difference between narrow AI and Artificial General Intelligence (AGI). This distinction is perhaps even more important today given the tsunami of AI hype we're drowning in.

I correctly identified that deep learning was being successfully applied to specific problems like speech recognition, object recognition, and natural language processing. These narrow applications have indeed flourished, exactly as I predicted.

What I Got Spectacularly Wrong

Several of my 2017 statements now appear almost comically outdated:

  1. "We have made little to no progress in realizing AGI in the past few decades." This assertion has been thoroughly contradicted. While true AGI still doesn't exist, the gap between narrow AI and AGI has narrowed dramatically. Modern LLMs like GPT-4, Claude, and Llama 2 demonstrate capabilities that obliterate traditional boundaries:

    • They reason across completely different domains with surprising coherence
    • They exhibit emergent abilities nobody explicitly programmed into them
    • They solve problems they were never trained on through zero-shot learning
  2. "All neural networks only produce useful outputs for inputs they were trained for." This couldn't be more wrong. Today's foundation models generalize far beyond their training data. GPT-4 writes code, crafts poetry, passes bar exams, and reasons through complex problems - none of which were explicitly encoded in its training. The generalization capabilities of these systems would have been utterly inconceivable in 2017.

  3. "Architecture and hyperparameters are significantly different from problem to problem." The transformer architecture completely upended this assumption. Rather than building specialized architectures for each task, we now fine-tune or prompt general-purpose models for virtually any application. This shift from task-specific models to general-purpose architectures represents one of the most profound paradigm shifts in AI history.

The Scaling Hypothesis: A Partial Victory, Then a Wall

The scaling hypothesis - the idea that simply throwing more compute and data at models would automatically unlock new capabilities - seemed almost miraculous... until it didn't.

In 2020, OpenAI's GPT-3 with its 175 billion parameters demonstrated capabilities not seen in smaller models, seemingly validating the "bigger is better" approach. But by 2025, we've crashed into a troubling reality: pre-training performance gains have plateaued dramatically.

The widely anticipated GPT-4.5 proved deeply disappointing to the AI community, showing only marginal improvements despite massive increases in compute and data. This plateau strongly suggests we're approaching fundamental limitations in the current paradigm of scaling pre-training alone.

We've seen this pattern before. The original Arc AGI-1 benchmark was eventually saturated by models like OpenAI's o3, but only through brute force approaches that potentially cost thousands of dollars per run. This is a critical point that's often overlooked – these "successes" were achieved through computational excess rather than genuine intelligence. A human brain running on less than 100 watts of energy can easily solve Arc AGI tasks, while our most advanced AI systems require data centers consuming megawatts of power.

The recently released Arc AGI-2 benchmark directly addresses this efficiency gap. Released just a couple of days ago, it doesn't just measure performance but also computational efficiency. According to its founders, even our most powerful frontier foundation models are expected to achieve only single-digit performance scores. This anticipated poor performance exposes the growing chasm between the breathless hype surrounding these systems and their actual capabilities on challenging reasoning tasks.

Test Time Scaling: Our Only Salvation

What's become increasingly clear is that without innovations in test time scaling techniques, progress would have stalled completely. These approaches have become essential workarounds for pre-training limitations:

Without these techniques, we would be utterly stuck in advancing AI capabilities. The fact that we're relying so heavily on test-time innovations rather than pre-training breakthroughs should be deeply concerning to anyone investing billions in scaling up model size and training.

Are We Actually Closer to AGI?

In 2017, I dismissed AGI as "a hopeless dream." The landscape has shifted dramatically, but not nearly as dramatically as the hype suggested just a few years ago. The expert community is increasingly fragmented:

The unexpected plateau in model performance has deflated many ambitious timelines. The path to AGI now appears far more challenging than the straightforward scaling narrative that dominated from 2020-2023.

Safety Concerns: Not So Ridiculous After All

In 2017, I dismissed existential risk concerns as mere sensationalism. While I still believe comparing modern AI to Skynet remains absurd, legitimate safety concerns have emerged that demand serious attention:

The reality is that these systems are becoming increasingly powerful tools that can be wielded by humans with various agendas. Even if they never develop "agency" in the sci-fi sense, they amplify human capabilities in ways that could prove destabilizing.

Conclusion

My 2017 post was technically accurate at the time but completely missed the acceleration in AI capabilities we've witnessed. The gap between narrow AI and AGI remains, but it has narrowed in complex and unexpected ways.

The next five years will likely bring a much more difficult journey than many predicted. Can we overcome the pre-training plateau through architectural innovations? Will test-time scaling techniques continue compensating for these limitations? Or do we need to completely rethink our approach to AI advancement?

One thing is certain: dismissing concerns about advanced AI as mere sensationalism no longer makes sense. As these systems grow more capable, alignment with human values isn't just an academic exercise - it's essential for ensuring that AI development remains beneficial rather than harmful.