Module 2 · ~8 minutes

Module 2: Current AI Capabilities

Understanding AGI

READ

In November 2023, Google DeepMind's Gemini Ultra scored 90.0% on the MMLU benchmark — a test spanning 57 academic subjects from abstract algebra to world religions. For context, the average human expert scores around 89.8%. An AI system is now marginally better than the average expert at answering questions across nearly every academic domain that exists.

Read that again. Not one domain. Fifty-seven.

So here's the uncomfortable question this module forces you to confront: if current AI isn't AGI, what exactly is it?

The honest capability map

Let's cut through the marketing. Here's what AI can actually do well right now, with real numbers.

Language and reasoning: GPT-4 scores in the 90th percentile on the bar exam. Claude can analyse 200,000 words of text in a single prompt — that's roughly three novels. These systems write professional-quality code, legal briefs, medical summaries, and marketing copy that passes expert review.

Science: AlphaFold 2 predicted the 3D structure of virtually every known protein — 200 million structures — solving a problem that had stumped biology for 50 years. In 2024, DeepMind's successors started designing entirely new proteins that don't exist in nature.

Vision and multimodal understanding: GPT-4V can look at a photo of your fridge contents and suggest recipes. Google's Med-PaLM 2 performs at "expert doctor" level on medical imaging. These aren't demos — they're deployed capabilities.

Mathematics: In 2024, AI systems began solving International Mathematical Olympiad problems. DeepMind's AlphaProof and AlphaGeometry scored silver-medal level. These are problems that stump most PhD mathematicians.

Quick Check

Which statement better describes current AI capabilities?

Which prompt is better?

Where it still falls apart

Now here's what the hype merchants won't tell you.

Reliability is nowhere near human-level. AI systems hallucinate — they fabricate facts with complete confidence. GPT-4 still gets basic arithmetic wrong sometimes. Claude will cite papers that don't exist. In high-stakes domains (medicine, law, finance), this isn't a quirk. It's disqualifying without human oversight.

Planning over long horizons is weak. Ask an AI to plan a complex project with dependencies, resource constraints, and uncertainty? It'll give you something that looks right but falls apart under scrutiny. Multi-step reasoning with genuine uncertainty is still an unsolved problem.

Genuine understanding vs. sophisticated pattern matching. This is the big debate. When GPT-4 explains quantum mechanics, does it understand quantum mechanics? Or is it doing the world's most impressive autocomplete? The honest answer: we don't fully know. But the failures suggest something is missing. AI systems make errors that no human who truly understood the material would make.

Embodied interaction. Robotics is 10+ years behind language AI. Boston Dynamics robots can do backflips, but they can't reliably make you a cup of tea. The physical world is messy, unpredictable, and unforgiving in ways that text isn't.

Quick Check

Which capability is STILL a significant weakness for current AI?

The "jagged frontier" — this is the key concept

Ethan Mollick at Wharton coined this term, and it's the single most useful framework for understanding current AI. The capability boundary isn't a smooth line. It's jagged.

AI is superhuman at some tasks and below average at others, and the boundary doesn't follow any intuitive pattern. It can write a better marketing email than most professionals but can't count the number of 'r's in "strawberry" (yes, really — this was a famous failure).

What this means for you: you cannot assume AI will be good or bad at a task based on how "hard" that task seems to humans. You have to test it. Every time. The jagged frontier is why people who actually use AI daily have a massive advantage over those who tried it once and made up their minds.

The acceleration that should have your attention

Here's what genuinely unsettles AI researchers: the rate of improvement.

GPT-3 to GPT-4: roughly 18 months, with a capability jump that stunned even OpenAI

Claude 2 to Claude 3.5 Sonnet: under a year, with coding ability improving by roughly 60%

Open-source models in early 2025 match GPT-4's 2023 performance — meaning today's frontier capability becomes commodity capability within 12-18 months

This isn't linear improvement. And there's no sign it's slowing down. The three inputs — compute, data, and algorithms — are all still scaling. We haven't hit a wall. We haven't even seen the wall.

The practical upshot: whatever you think AI can't do today, check again in six months. Your mental model of AI capability has a half-life of about a year.

Quick Check

AI capabilities have been improving at a steady, predictable rate over the past 5 years.

---

EXERCISE

The Jagged Frontier Map

Pick 10 tasks from your work or daily life. For each one:
1. Have AI attempt it (use ChatGPT, Claude, or Gemini)
2. Rate the output: 🟢 Better than me / 🟡 About the same / 🔴 Worse than me
3. Note whether the difficulty for humans predicted the AI's performance

You'll likely find at least 2-3 surprises — tasks you thought would be easy for AI that weren't, or tasks you thought would be hard that it nailed. That's the jagged frontier. Map it for your world.

---

KEY TAKEAWAYS

1Current AI matches or exceeds human experts across dozens of academic domains — this is not narrow AI anymore
2Reliability, long-horizon planning, and genuine understanding are still major gaps
3The "jagged frontier" means AI capability doesn't follow human intuitions about task difficulty — you must test, not assume
4Improvement rate shows no signs of slowing; your mental model of AI has a ~12-month half-life
5The gap between frontier and commodity AI is shrinking to 12-18 months

Previous Next: Timeline to AGI