CULTURE

Why A.I. Didn’t Transform Our Lives in 2025


One year ago, Sam Altman, the C.E.O. of OpenAI, made a bold prediction: “We believe that, in 2025, we may see the first AI agents ‘join the workforce’ and materially change the output of companies.” A couple of weeks later, the company’s chief product officer, Kevin Weil, said at the World Economic Forum conference at Davos in January, “I think 2025 is the year that we go from ChatGPT being this super smart thing . . . to ChatGPT doing things in the real world for you.” He gave examples of artificial intelligence filling out online forms and booking restaurant reservations. He later promised, “We’re going to be able to do that, no question.” (OpenAI has a corporate partnership with Condé Nast, the owner of The New Yorker.)

This was no small boast. Chatbots can respond directly to a text-based prompt—by answering a question, say, or writing a rough draft of an e-mail. But an agent, in theory, would be able to navigate the digital world on its own, and complete tasks that require multiple steps and the use of other software, such as web browsers. Consider everything that goes into making a hotel reservation: deciding on the right nights, filtering based on one’s preferences, reading reviews, searching various websites to compare rates and amenities. An agent could conceivably automate all of these activities. The implications of such a technology would be immense. Chatbots are convenient for human employees to use; effective A.I. agents might replace the employees altogether. The C.E.O. of Salesforce, Marc Benioff, who has claimed that half the work at his company is done by A.I., predicted that agents will help unleash a “digital labor revolution,” worth trillions of dollars.

2025 in Review

New Yorker writers reflect on the year’s highs and lows.

2025 was heralded as the Year of the A.I. Agent in part because, by the end of 2024, these tools had become undeniably adept at computer programming. A demo of OpenAI’s Codex agent, from May, showed a user asking the tool to modify his personal website. “Add another tab next to investment/tools that is called ‘food I like.’ In the doc put—tacos,” the user wrote. The chatbot quickly carried out a sequence of interconnected actions: it reviewed the files in the website’s directory, examined the contents of a promising file, then used a search command to find the right location to insert a new line of code. After the agent learned how the site was structured, it used this information to successfully add a new page that featured tacos. As a computer scientist myself, I had to admit that Codex was tackling the task more or less as I would. Silicon Valley grew convinced that other difficult tasks would soon be conquered.

As 2025 winds down, however, the era of general-purpose A.I. agents has failed to emerge. This fall, Andrej Karpathy, a co-founder of OpenAI, who left the company and started an A.I.-education project, described agents as “cognitively lacking” and said, “It’s just not working.” Gary Marcus, a longtime critic of tech-industry hype, recently wrote on his Substack that “AI Agents have, so far, mostly been a dud.” This gap between prediction and reality matters. Fluent chatbots and reality-bending video generators are impressive, but they cannot, on their own, usher in a world in which machines take over many of our activities. If the major A.I. companies cannot deliver broadly useful agents, then they may be unable to deliver on their promises of an A.I.-powered future.

The term “A.I. agents” evokes ideas of supercharged new technology reminiscent of “The Matrix” or “Mission: Impossible—The Final Reckoning.” In truth, agents are not some kind of customized digital brain; instead, they are powered by the same type of large language model that chatbots use. When you ask an agent to tackle a chore, a control program—a straightforward application that coördinates the agent’s actions—turns your request into a prompt for an L.L.M. Here’s what I want to accomplish, here are the tools available, what should I do first? The control program then attempts any actions that the language model suggests, tells it about the outcome, and asks, Now what should I do? This loop continues until the L.L.M. deems the task complete.

This setup turns out to excel at automating software development. Most of the actions required to create or modify a computer program can be implemented by entering a limited set of commands into a text-based terminal. These commands tell a computer to navigate a file system, add or update text in source files, and, if needed, compile human-readable code into machine-readable bits. This is an ideal setting for L.L.M.s. “The terminal interface is text-based, and that is the domain that language models are based on,” Alex Shaw, the co-creator of Terminal-Bench, a popular tool used to evaluate coding agents, told me.

More generalized assistants, of the sort envisioned by Altman, would require agents to leave the comfortable constraints of the terminal. Since most of us complete computer tasks by pointing and clicking, an A.I. that can “join the workforce” probably needs to know how to use a mouse—a surprisingly difficult goal. The Times recently reported on a string of new startups that have been building “shadow sites”—replicas of popular webpages, like those of United Airlines and Gmail, on which A.I. can analyze how humans use a cursor. In July, OpenAI released ChatGPT Agent, an early version of a bot that can use a web browser to complete tasks, but one review noted that “even simple actions like clicking, selecting elements, and searching can take the agent several seconds—or even minutes.” At one point, the tool got stuck for nearly a quarter of an hour trying to select a price from a real-estate site’s drop-down menu.



Source link

Related Articles

Back to top button
floridadigitalnews
Verified by MonsterInsights