2025: A Look Back

Our Perspectives highlights the voices of leadership and experts within the Redhorse Corporation on trends, topics and insights that are of interest to our customers and partners. The views and opinions expressed here are unique to each author and not to the company as a whole.

To read more and catch new insights regularly from our Chief Technology Officer, Matt Teschke, click here to connect or follow him on LinkedIn.

With most of 2025 behind us, I wanted to take a look back at the predictions I made at the beginning of the year. Some of them may seem obvious in retrospect, but that’s largely because developments in AI have continued to accelerate – and don’t show much sign of slowing down.

It was another exciting year, so let’s jump right into it.

Prediction: Test-time compute takes off

Verdict: Nailed it!

This is one that seems obvious in retrospect because nearly all models use test-time compute (“thinking”, now the commonly-used term). While thinking models were just starting to be released at the end of 2024, nearly all major models involve some form of thinking now. The METR chart is a great way to visualize this trend, as it shows how long models can operate independently and still complete the task at least 50% of the time. From this chart, we can see that we went from about 10 minutes at the beginning of 2024 to over 2.5 hours at the end of 2025. If anything close to this trend holds, that means that we’ll have models in 2026 that can work independently for days. While this may seem like quite the claim if you aren’t following the industry closely, many people report that Anthropic’s Claude Opus 4.5 (released in November 2025) is able to operate independently and reliably for many hours at a time.

METR Chart 2025 — Measuring Al Ability to Complete Long-Running Tasks

These models are able to operate independently for even longer periods when they have the right scaffolding or system to support them. Applications like Claude Code and Replit provide additional functionality, such as the use of multiple agents, to extend the time these models can operate reliably. Developers who have learned to harness the power of these tools see their productivity increase by 2x or more.

Prediction: Innovative UX takes off (e.g. Whisk, tldraw computer)

Verdict: Largely a miss

While alternative paradigms to the chat interface hold a lot of promise, there are not too many that have gained significant share relative to what was popular in 2024: chat boxes and AI in IDEs and consoles (the applications developers use to write code). While other UIs are better in many use cases, the inertia behind the existing interfaces has proven to be quite strong and “good enough” for many use cases.

NotebookLM from Google is one of the most interesting, especially since the team regularly releases new features. It is a great tool for understanding since it provides a range of options, such as custom podcasts, that help people digest documents.

I am also really excited for low-code applications which enable users to create simple, but potentially very useful, workflows that make use of LLMs. Expanding access to the power of LLMs beyond developers could unlock a lot of value for companies.

Prediction: Agents as coworkers

Verdict: Correct for software dev, lagging elsewhere

When I made this prediction a year ago, I was envisioning a semi-autonomous teammate that would live in our work environments. I think we are still a little ways off from that, but for software development we are definitely there. The newest models and applications (e.g. Claude Code, Devin, Factory AI, OpenHands) enable users to assign tasks much like they would to a junior engineer. This is changing how software developers work, as they need to focus on building the system within which the AI will operate, plan for the architecture and new features, and review the output of the coding agents. Developers who have learned to adopt these tools easily increase their output by at least 2x, with many seeing an even larger gain in productivity.

The practical implications of this are that not only are developers able to finish their work more quickly and with higher quality, but that they can take on projects or features they otherwise would not have had time for. I’ve seen this in my personal life, where since I can create a working app in just a couple of hours I’ll take on projects I never would have considered doing before.

Prediction: OpenAI relinquishes its crown

Verdict: Nailed it (for now…)

Last year I wrote: I expect Google and Anthropic to continue releasing new models on a regular basis, and by the end of 2025 one of them will be recognized as the leading AI company against which others (including OpenAI) are compared.

With Google’s release of Gemini 3.0 Pro and Anthropic’s release of Claude Opus 4.5, I feel like I nailed this one. When Gemini 3.0 Pro was released on November 18, it shot to the top of nearly every leaderboard and impressed users, especially with its ability to design application front ends. Google’s lead didn’t last for long, as Anthropic released Opus 4.5 on November 24. Anthropic’s model leapfrogged Google’s on many leaderboards and many users (myself included) have found it to be the best for coding. Anthropic even addressed price, which had been an issue with their Opus-series of models.

The next year will continue to see competition from the leading AI labs, plus a few surprises I’m sure from China and others. There’s not much we can be sure about, but I am confident we’ll have another year of rapidly improving models. (And if the rumors are true, we’ll see a few more big releases before the end of 2025).

Update December 12, 2025: I knew that by writing this article before 2025 was truly over that I was running the risk of OpenAI releasing another model that could beat Anthropic’s and Google’s releases. Well, they did just that – releasing GPT 5.2 on December 11. GPT 5.2 seems like a very impressive model, leading on many benchmarks. I look forward to seeing how it works in practice!

Prediction: Humanoid Robots in the Home

Verdict: One year early

I should know by now that hardware doesn’t develop at the same pace as software, because iterating in the physical world faces additional constraints. So while we don’t have robots that can help with the dishes, we are close. Companies including Figure, Sunday, and 1X seem poised to start delivering useful robots for home use in 2026.

Like many first-generation products, these will be expensive and full of rough edges, but the trajectory is clear. These robots will continue becoming more capable while decreasing in price, so that in a few years many of us will be able to have a household assistant to help with chores.

Prediction: AI plays significant role in scientific or math breakthrough

Verdict: One year (or maybe month?) early

OpenAI and Google surprised many with their models’ success in math and science competitions this year, achieving gold medal scores in some of the toughest events. While impressive, the takeaway for me is the trajectory these models are on, as most people in the industry didn’t expect them to be able to perform at this level for another year or two.

If you follow the AI field on social media, you see even more hints at what is possible. Terrance Tao, one of the greatest living mathematicians, has talked about how he uses AI and that it can be useful to researchers. The leading AI companies are investing heavily in internal labs that will demonstrate how AI can support progress in science and medicine. There’s been a lot of interest this month in seeing if AI can independently solve open Erdős problems, with seemingly some success. A new startup

And in on November 24, 2025, the Department of Energy announced the Genesis Mission, a truly ambitious effort to combine the government’s vast scientific data holdings with the AI and scientific infrastructure to make significant advances. This ambitious effort will have significant funding and the support of many leading tech companies – so I’m optimistic about progress in 2026.

Conclusion

If you follow AI closely it can be easy to get lost in relatively minor product or feature announcements. While it’s fun to track these advances, the real impact will come from longer-term trends and a few major product releases each year. That’s why, as much as I like getting hands-on with the latest models, I take the time to step back and think about the trends that really matter. 2025 was everything we could have hoped for – I can’t wait to see what 2026 has in store!

2025: A Look Back

Prediction: Test-time compute takes off

Prediction: Innovative UX takes off (e.g. Whisk, tldraw computer)

Prediction: Agents as coworkers

Prediction: OpenAI relinquishes its crown

Prediction: Humanoid Robots in the Home

Prediction: AI plays significant role in scientific or math breakthrough

Conclusion

Related

How GraphRAG Elevates LLMs

Reflections on AI Code Assistants

A Look Ahead at What’s in Store for AI in 2025