Skip to content
Gotta catch 'em, AI

Why Anthropic’s Claude still hasn’t beaten Pokémon

Weeks later, Sonnet's "reasoning" model is struggling with a game designed for children.

Kyle Orland | 234
A game Boy Color playing Pokémon Red surrounded by the tendrils of an AI, or maybe some funky glowing wires, what do AI tendrils look like anyways
Gotta subsume 'em all into the machine consciousness! Credit: Aurich Lawson
Gotta subsume 'em all into the machine consciousness! Credit: Aurich Lawson

In recent months, the AI industry's biggest boosters have started converging on a public expectation that we're on the verge of “artificial general intelligence” (AGI)—virtual agents that can match or surpass "human-level" understanding and performance on most cognitive tasks.

OpenAI is quietly seeding expectations for a "PhD-level" AI agent that could operate autonomously at the level of a "high-income knowledge worker" in the near future. Elon Musk says that "we'll have AI smarter than any one human probably" by the end of 2025. Anthropic CEO Dario Amodei thinks it might take a bit longer but similarly says it's plausible that AI will be "better than humans at almost everything" by the end of 2027.

Last month, Anthropic presented its “Claude Plays Pokémon” experiment as a waypoint on the road to that predicted AGI future. It's a project the company said shows "glimmers of AI systems that tackle challenges with increasing competence, not just through training but with generalized reasoning." Anthropic made headlines by trumpeting how Claude 3.7 Sonnet’s "improved reasoning capabilities" let the company's latest model make progress in the popular old-school Game Boy RPG in ways "that older models had little hope of achieving."

While Claude models from just a year ago struggled even to leave the game’s opening area, Claude 3.7 Sonnet was able to make progress by collecting multiple in-game Gym Badges in a relatively small number of in-game actions. That breakthrough, Anthropic wrote, was because the “extended thinking” by Claude 3.7 Sonnet means the new model "plans ahead, remembers its objectives, and adapts when initial strategies fail" in a way that its predecessors didn’t. Those things, Anthropic brags, are "critical skills for battling pixelated gym leaders. And, we posit, in solving real-world problems too."

Over the last year, new Claude models have shown quick progress in reaching new Pokémon milestones.
Over the last year, new Claude models have shown quick progress in reaching new Pokémon milestones. Credit: Anthropic

But relative success over previous models is not the same as absolute success over the game in its entirety. In the weeks since Claude Plays Pokémon was first made public, thousands of Twitch viewers have watched Claude struggle to make consistent progress in the game. Despite long "thinking" pauses between each move—during which viewers can read printouts of the system’s simulated reasoning process—Claude frequently finds itself pointlessly revisiting completed towns, getting stuck in blind corners of the map for extended periods, or fruitlessly talking to the same unhelpful NPC over and over, to cite just a few examples of distinctly sub-human in-game performance.

Watching Claude continue to struggle at a game designed for children, it’s hard to imagine we’re witnessing the genesis of some sort of computer superintelligence. But even Claude’s current sub-human level of Pokémon performance could hold significant lessons for the quest toward generalized, human-level artificial intelligence.

Smart in different ways

In some sense, it’s impressive that Claude can play Pokémon with any facility at all. When developing AI systems that find dominant strategies in games like Go and Dota 2, engineers generally start their algorithms off with deep knowledge of a game’s rules and/or basic strategies, as well as a reward function to guide them toward better performance. For Claude Plays Pokémon, though, project developer and Anthropic employee David Hershey says he started with an unmodified, generalized Claude model that wasn’t specifically trained or tuned to play Pokémon games in any way.

“This is purely the various other things that [Claude] understands about the world being used to point at video games,” Hershey told Ars. “So it has a sense of a Pokémon. If you go to claude.ai and ask about Pokémon, it knows what Pokémon is based on what it's read… If you ask, it'll tell you there's eight gym badges, it'll tell you the first one is Brock… it knows the broad structure.”

A flowchart summarizing the pieces that help Claude interact with an active game of Pokémon (click through to zoom in).
A flowchart summarizing the pieces that help Claude interact with an active game of Pokémon (click through to zoom in). Credit: Anthropic / Excelidraw

In addition to directly monitoring certain key (emulated) Game Boy RAM addresses for game state information, Claude views and interprets the game's visual output much like a human would. But despite recent advances in AI image processing, Hershey said Claude still struggles to interpret the low-resolution, pixelated world of a Game Boy screenshot as well as a human can. “Claude's still not particularly good at understanding what's on the screen at all,” he said. “You will see it attempt to walk into walls all the time.”

Hershey said he suspects Claude’s training data probably doesn’t contain many overly detailed text descriptions of “stuff that looks like a Game Boy screen.” This means that, somewhat surprisingly, if Claude were playing a game with “more realistic imagery, I think Claude would actually be able to see a lot better,” Hershey said.

“It's one of those funny things about humans that we can squint at these eight-by-eight pixel blobs of people and say, ‘That's a girl with blue hair,’” Hershey continued. “People, I think, have that ability to map from our real world to understand and sort of grok that... so I'm honestly kind of surprised that Claude's as good as it is at being able to see there's a person on the screen.”

Even with a perfect understanding of what it’s seeing on-screen, though, Hershey said Claude would still struggle with 2D navigation challenges that would be trivial for a human. “It’s pretty easy for me to understand that [an in-game] building is a building and that I can’t walk through a building,” Hershey said. “And that's [something] that's pretty challenging for Claude to understand… It's funny because it's just kind of smart in different ways, you know?”

A sample Pokémon screen with an overlay showing how Claude characterizes the game's grid-based map.
A sample Pokémon screen with an overlay showing how Claude characterizes the game's grid-based map. Credit: Anthrropic / X

Where Claude tends to perform better, Hershey said, is in the more text-based portions of the game. During an in-game battle, Claude will readily notice when the game tells it that an attack from an electric-type Pokémon is “not very effective” against a rock-type opponent, for instance. Claude will then squirrel that factoid away in a massive written knowledge base for future reference later in the run. Claude can also integrate multiple pieces of similar knowledge into pretty elegant battle strategies, even extending those strategies into long-term plans for catching and managing teams of multiple creatures for future battles.

Claude can even show surprising “intelligence” when Pokémon’s in-game text is intentionally misleading or incomplete. “It's pretty funny that they tell you you need to go find Professor Oak next door and then he's not there,” Hershey said of an early-game task. “As a 5-year-old, that was very confusing to me. But Claude actually typically goes through that same set of motions where it talks to mom, goes to the lab, doesn't find [Oak], says, ‘I need to figure something out’… It's sophisticated enough to sort of go through the motions of the way [humans are] actually supposed to learn it, too.”

A sample of the kind of simulated reasoning process Claude steps through during a typical Pokémon battle.
A sample of the kind of simulated reasoning process Claude steps through during a typical Pokémon battle. Credit: Claude Plays Pokemon / Twitch

These kinds of relative strengths and weaknesses when compared to “human-level” play reflect the overall state of AI research and capabilities in general, Hershey said. “I think it's just a sort of universal thing about these models... We built the text side of it first, and the text side is definitely... more powerful. How these models can reason about images is getting better, but I think it's a decent bit behind.”

Forget me not

Beyond issues parsing text and images, Hershey also acknowledged that Claude can have trouble “remembering” what it has already learned. The current model has a “context window” of 200,000 tokens, limiting the amount of relational information it can store in its “memory” at any one time. When the system’s ever-expanding knowledge base fills up this context window, Claude goes through an elaborate summarization process, condensing detailed notes on what it has seen, done, and learned so far into shorter text summaries that lose some of the fine-grained details.

This can mean that Claude “has a hard time keeping track of things for a very long time and really having a great sense of what it's tried so far,” Hershey said. “You will definitely see it occasionally delete something that it shouldn't have. Anything that's not in your knowledge base or not in your summary is going to be gone, so you have to think about what you want to put there.”

A small window into the kind of "cleaning up my context" knowledge-base update necessitated by Claude's limited "memory."
A small window into the kind of "cleaning up my context" knowledge-base update necessitated by Claude's limited "memory." Credit: Claude Play Pokemon / Twitch

More than forgetting important history, though, Claude runs into bigger problems when it inadvertently inserts incorrect information into its knowledge base. Like a conspiracy theorist who builds an entire worldview from an inherently flawed premise, Claude can be incredibly slow to recognize when an error in its self-authored knowledge base is leading its Pokémon play astray.

“The things that are written down in the past, it sort of trusts pretty blindly,” Hershey said. “I have seen it become very convinced that it found the exit to [in-game location] Viridian Forest at some specific coordinates, and then it spends hours and hours exploring a little small square around those coordinates that are wrong instead of doing anything else. It takes a very long time for it to decide that that was a ‘fail.’”

Still, Hershey said Claude 3.7 Sonnet is much better than earlier models at eventually “questioning its assumptions, trying new strategies, and keeping track over long horizons of various strategies to [see] whether they work or not.” While the new model will still “struggle for really long periods of time” retrying the same thing over and over, it will ultimately tend to “get a sense of what's going on and what it’s tried before, and it stumbles a lot of times into actual progress from that,” Hershey said.

“We’re getting pretty close…”

One of the most interesting things about observing Claude Plays Pokémon across multiple iterations and restarts, Hershey said, is seeing how the system’s progress and strategy can vary quite a bit between runs. Sometimes Claude will show it's “capable of actually building a pretty coherent strategy” by “keeping detailed notes about the different paths to try,” for instance, he said. But “most of the time it doesn't… most of the time, it wanders into the wall because it's confident it sees the exit.”

One of the biggest things preventing the current version of Claude from getting better, Hershey said, is that “when it derives that good strategy, I don't think it necessarily has the self-awareness to know that one strategy [it] came up with is better than another.” And that’s not a trivial problem to solve.

Still, Hershey said he sees “low-hanging fruit” for improving Claude’s Pokémon play by improving the model’s understanding of Game Boy screenshots. “I think there's a chance it could beat the game if it had a perfect sense of what's on the screen,” Hershey said, saying that such a model would probably perform “a little bit short of human.”

Expanding the context window for future Claude models will also probably allow those models to “reason over longer time frames and handle things more coherently over a long period of time,” Hershey said. Future models will improve by getting “a little bit better at remembering, keeping track of a coherent set of what it needs to try to make progress,” he added.

Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon's Mt. Moon.
Twitch chat responds with a flood of bouncing emojis as Claude concludes an epic 78+ hour escape from Pokémon's Mt. Moon. Credit: Claude Plays Pokemon / Twitch

Whatever you think about impending improvements in AI models, though, Claude’s current performance at Pokémon doesn’t make it seem like it’s poised to usher in an explosion of human-level, completely generalizable artificial intelligence. And Hershey allows that watching Claude 3.7 Sonnet get stuck on Mt. Moon for 80 hours or so can make it “seem like a model that doesn't know what it's doing.”

But Hershey is still impressed at the way that Claude's new reasoning model will occasionally show some glimmer of awareness and “kind of tell that it doesn't know what it's doing and know that it needs to be doing something different. And the difference between ‘can't do it at all’ and ‘can kind of do it’ is a pretty big one for these AI things for me,” he continued. “You know, when something can kind of do something it typically means we're pretty close to getting it to be able to do something really, really well.”

Photo of Kyle Orland
Kyle Orland Senior Gaming Editor
Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.
234 Comments
Staff Picks
i
I watched Claude solve a puzzle.

After getting stuck by a line of trees, he visited a building where the NPC mentioned that pokemon can use the CUT command while unconscious. And Claude was like-- "that's a HINT!" And then he walked outside and cut down the unique looking tree.

Like so many AI things, it was both stupid and magical at the same time. Baby steps.
katie3anderson
I know there's no way to really do it without retraining the entire thing on a dataset that's been purged of pokemon, but I'd love to see how it handles the game like this where everything is novel. When I played it as a kid I hadn't pored through strategy guides or even watched the TV show to know anything about it. There's a ton in the game to help teach you, and digging through the party details or pokedex gives you a lot of hints about how things work. You learn where you need to go from a combination of exploration/experimentation and speaking with NPCs. Claude came into this with a pretty decent understanding of the mechanics of the games, the individual moves and pokemon, and the general route needed to beat the game.

Finding a game with the gameboy-like simplicity but general depth of Pokemon that was created after training would be difficult enough, but most modern games will also benefit from player expectations built up over the decades. Pokemon was the first exposure of to that sort of game for a lot of young players who didn't get much benefit from their existing knowledge of games.
T
“You know, when something can kind of do something it typically means we're pretty close to getting it to be able to do something really, really well.”
Uhhh what? This is exact inverse of how technology development works. Getting something to kinda sorta work sometimes is like the first 2% of the effort, maybe less.

The hubris of these people is just mind blowing to me. It’s like they have literally no idea what they are doing, from a tech (nevermind product) development standpoint.
c
If anything, the article understates just how bad Claude is at planning.

Here is a map of Vermilion City. It literally consists of 7 enterable buildings. But Claude would spend 6-8 hours at a time just wandering around, occasionally going in and out of buildings, desperately trying to find the entrance to the S.S. Anne.

It makes sense that it would get confused. It knew the coordinates of the entrance were directly south of its current position at the cluster of buildings, because it had been there before and recorded the coordinates. So it would go south. But there's no straight path from there to the dock (you have to go east first), so Claude would hit a dead end and head back.

What's harder to justify is why it would make that same loop 20 times in a row. And then wander off somewhere else for a few hours, and come back and do it all over again.

It essentially has no long-term memory. It does have an elaborate system of note-taking programmed in, but it rarely takes useful notes, and when it does, it usually doesn't pay attention to them.

It never comes up with meta-strategies, like "always hug the right wall" or "use the notes to count how many times you've been to each place". (Actually, I heard that it thought of the first one at some point, but it didn't manage to execute on it.) If the prompt contained these meta-strategies, it might help its performance, but that would be cheating, and it still would probably have a hard time executing.

When it does make progress, it's often by dumb luck.

But I don't want to be too negative.

For one thing, I find the stream fascinating to watch. I hate AI hype as much as the next person, and that includes the way Anthropic has talked about the Pokémon bot as if it were more capable than it really is. But when I actually watch the stream, I can't help cheering the bot on. The short-term thinking, the failure to understand what's right there on the screen, it all makes the bot come across like a dog or cat. It's not demonstrating AGI, but it's doing its best!

(It's actually a lot more entertaining this way than it would be if it were more competent. Just like DeepDream was a lot more entertaining than modern image generators, and Markov chains were more entertaining than LLMs.)

Also… I tried running my own clone of the experiment with other LLMs from Google and OpenAI. They did much worse. To be fair, I didn't implement all the bells and whistles, which partly explains the underperformance. But in their attempts to get past the first ~2 rooms of the game, they would constantly do things I've never seen happen in the ClaudePlaysPokemon stream, like:
  • Try to talk to an NPC, fail, then hallucinate the dialogue that the NPC is supposed to say.
  • Think a text box is still there after it's been dismissed; or think the menu has been dismissed when it's still there. Then try to move, fail, and conclude the game is broken.
Which suggests that Claude 3.7 really is a meaningful step up from the state of the art from like half a year ago. As much as LLMs seem to be reaching a performance asymptote, there's still room to grow. For better or worse, maybe it won't be too long before we see an LLM that can navigate Pokémon with confidence.