Four Takeaways from Stanford's Annual AI Report

June 10, 2025 / Steffen Bartschat

Perusing the AI Index Report, published by Stanford’s Institute for Human-Centered Artificial Intelligence, has become an annual spring-time ritual in Silicon Valley. It’s a great resource to go beyond the AI media hype and see the real data on the advances that have been made over the past year. Here are my key takeaways from the 2025 report:

1: We have not yet run out of training data, but we’re getting closer. (page 59)

The Epoch AI research institute originally predicted we might run out as soon as 2024, now it pushed that date to “between 2026 and 2032”. Of course, with GPU hardware performance doubling every two years and the temptation to “overtrain” models to make the inference portion (i.e. when you get to ask the chatbot your questions) more efficient, the voracious appetite for training will only increase. There is a solution to this problem: synthetic data. This is where you ask another AI to create “fake” training data for your models. Stanford cites mixed results so far for this strategy – it worked for some cases and didn’t for others. Put me in the skeptic category on synthetic data. Ultimately this problem will be solved by unleashing scores of robots to interact with and learn from the physical world, the same way we teach our human neural networks. Autonomous driving companies such as Waymo and Tesla are already doing this!

2 – Inference costs have dropped dramatically – is that good news for the search incumbents, like Google? (page 64)

The numbers are a bit difficult to decipher, but it appears that a standard ChatGPT 3.5-level interaction has gotten dramatically cheaper – by a factor of 280. This explains why Google is now able to inject an “AI Overview” for every search query we submit, an attempt to hold its market share against the chatbots. However, ChatGPT 3.5 is now over 2.5 years old, so if you want to use the newest (and best) models, you end up back again with those high end inferencing prices. Conclusion: sell your Google stock-their business model can only afford giving you insight from 2-year-old AI tech-and enjoy the investor-subsidized AI insight from the latest models while it lasts…

In the post “China shocks the world with Deepseek” era, what has happened to the costs for training the most sophisticated models? Well, they continue to go up, despite having more efficient processors available. Deepseek’s $6M training price tag right now is just a blip compared to the $100M+ costs for the best models developed here in the US [with hopefully better performance]. Between the competition from China, continuing huge expenses for developing and operating new models, and the limited revenue generated by monthly subscriptions, I predict a reckoning for San Francisco’s AI majors. So… don’t take your Google share sales proceeds and put them into OpenAI…. Yet.

3 – AI Agents are “far from ready” (page 146)

Right now, the best models have only a 36% success rate against the VisualAgentBench benchmark. Apparently, most agent tasks are too complex, involving “diverse, dynamic and variable scenarios”. This one surprised me, given all the talk about AI agents being right around the corner. I had some positive experiences recently with a first rollout of the technology in customer support: I negotiated a lower price for my SiriusXM subscription and the entire transaction was handled by voice AI using natural conversational flow. On Amazon, I was able to get an errant shipment replaced using a natural interaction with their customer service chatbot. This year was the first year where companies trusted the AI to fully perform these functions – no wait for a human to provide final approval. But these agents are very specialized. We still seem quite a bit away from a “Siri” agent that can manage your life automatically.

4 – Waymo is making progress (page 158)
Another interesting discussion was data related to Waymo’s self-driving car deployments in San Francisco and Phoenix. The company’s cars have now logged over 31 million miles without a human driver. During that time, they reported 28 incidents that caused injuries, and 10 that included deployment of the airbag. That is less than a quarter of the number of incidents that human-driven vehicles would be involved in for the same mileage. Waymo does seem to be slightly worse-performing compared to humans when it comes to insurance claims for property damage. While the technical progress and the cited statistics are impressive, it, unfortunately, is only a matter of time before Waymo will get involved in an accident with fatalities. So far, no US company’s robotaxi effort has survived such an event: both Uber and Cruise exited the business soon after being involved in fatal or near-fatal crashes. The other challenge for autonomous transportation is the unit economics… will Waymo vehicles ever get cheap enough so the company can make money on each ride it conducts?