NATURE

What are the best AI tools for research? Nature’s guide


View of multiple screens displaying the logo of DeepSeek and the logo of OpenAI's artificial intelligence chatbot ChatGPT.

Credit: Lionel Bonaventure/AFP via Getty

A new and seemingly more impressive artificial intelligence (AI) tool is released almost weekly, and researchers are flocking to try them out. Whether they are looking to edit manuscripts, write code or generate hypotheses, researchers have more generative AI tools to choose from than ever before.

Each large language model (LLM) is suited to different tasks. Some are available through free chatbots, whereas others use a paid-for application programming interface (API) that means they can be integrated with other software. A few can also be downloaded, allowing researchers to build their own custom models.

Although LLMs produce human-like responses, they all remain too error-prone to be used on their own, says Carrie Wright, a data scientist at the Fred Hutchinson Cancer Center, headquartered in Seattle, Washington.

So which LLM is best for what task? Here, researchers share their current favourites with Nature to help guide those in need.

o3-mini (the reasoner)

OpenAI, based in San Francisco, California, introduced the world to LLMs in 2022 with its free-to-use ChatGPT bot. Scientists have mainly used the bot to look up information or as a writing assistant, for example to draft abstracts, but newer models are broadening the technology’s potential uses. Last September, in the firm’s most significant advance since then, OpenAI wowed scientists with its o1 ‘reasoning model’, which it followed with the more advanced o3 in December. Both reasoning models work more slowly than an LLM alone does, because they have been trained to answer queries in a step-by-step way. This “chain of thought” process, aimed at simulating human reasoning, has helped them to smash tough benchmarks in science and mathematics. It has also made them good at technical tasks, such as solving coding issues and reformatting data.

After the little-known Chinese start-up DeepSeek in Hangzhou launched a rival reasoner on 20 January, OpenAI responded with a range of new tools. These include a speedy o3-mini — a reasoner that is free for registered chatbot users — and ‘deep research’, which allows some paying subscribers to create reports that synthesize information, with citations, from hundreds of websites, akin to carrying out a literature review. The models excel when used in combination, says Andrew White, a chemist and AI expert at FutureHouse, a start-up in San Francisco.

When it comes to tasks such as picking apart unfamiliar concepts in a new mathematical proof, o3-mini does a “really good job”, says Simon Frieder, a mathematician and AI researcher at the University of Oxford, UK. But even the best models “are still not even close to rivalling a mathematician”, he says.

DeepSeek (the all-rounder)

DeepSeek-R1, launched last month, has abilities on a par with o1’s, but is available through an API at a fraction of the cost. It also stands apart from OpenAI’s models because it is open weight, meaning that although its training data have not been released, anyone can download the underlying model and tailor it to their specific research project. R1 has “just unlocked a new paradigm” in which communities, particularly those with relatively few resources, can build specialized reasoning models, says White.

Running the full model requires access to powerful computing chips, which many academics lack. But researchers such as Benyou Wang, a computer scientist at the Chinese University of Hong Kong, Shenzhen, are creating versions that can run or train on a single machine. Like o1, DeepSeek-R1’s forte is maths problems and writing code. But it is also good at tasks such as generating hypotheses, says White. This is because DeepSeek has opted to publish the model’s ‘thought processes’ in full, which allows researchers to better refine their follow-up questions and ultimately improve its outputs, he says. Such transparency could also be hugely powerful for medical diagnostics. Wang is adapting R1 in experiments that use the model’s reasoning-like powers to build “a clear and logical pathway from patient assessment to diagnosis and treatment recommendation”, he says.

DeepSeek-R1 has some cons. The model seems to have a particularly long ‘thought’ process, which slows it down and makes it less useful for looking up information or brainstorming. Concerns about the security of data input into its API and chatbot have led several governments to ban workers at national agencies from using the chatbot. DeepSeek also seems to have taken fewer measures to mitigate its models from generating harmful outputs than its commercial competitors. Adding filters to prevent such outputs — instructions to make weapons, for instance — takes time and effort. Although it is unlikely that this was done on purpose, “the lack of guard rails is worrisome”, says Simon.

OpenAI has also suggested that DeepSeek may have “inappropriately distilled” its models, referring to a method for training a model on another algorithm’s outputs, which OpenAI’s conditions of use prohibit.

DeepSeek could not be reached for comment on these criticisms before this article was published.

Some researchers see such distillation as common place and are happy to use R1, but others are wary of using a tool that could be subject to future litigation. There’s a chance that scientists using R1 could be forced to retract papers, if using the model was considered a violation of the journal’s ethical standards, says Ana Catarina De Alencar, a lawyer at EIT Manufacturing in Paris who specializes in AI law. A similar situation could apply to the use of models by OpenAI and other firms accused of intellectual-property violations, says De Alencar. News organizations claim that the firms used journalistic content to train their models without permission.

Llama (the workhorse)

Llama has long been a go-to LLM for the research community. A family of open-weight models first released by Meta AI in Menlo Park, California, in 2023, versions of Llama have been downloaded more than 600 million times through the open-science platform Hugging Face alone. The fact it can be downloaded and built on is “probably why Llama has been embraced by the research community”, says Elizabeth Humphries, a data scientist at the Fred Hutchinson Cancer Center.

Being able to run an LLM on personal or institutional servers is essential when working with proprietary or protected data, to avoid sensitive information being fed back to other users or to the developers, says Wright.

Researchers have built on Llama’s models to make LLMs that predict materials’ crystal structure, as well as to simulate the outputs of a quantum computer. Tianlong Chen, a machine-learning scientist at the University of North Carolina at Chapel Hill, says Llama was a good fit for simulating a quantum computer because it was relatively easy to adapt it to understand specialized quantum language.

But Llama requires users to request permission to access it, which is minor point of friction for some, says White. As a result, other open models such as OLMo, developed by the Allen Institute for Artificial Intelligence in Seattle, or Qwen, built by the Chinese firm Alibaba Cloud, based in Hangzhou, are now often the first choice in research, he adds. DeepSeek’s efficient underlying V3 model is also a rival base for building scientific models.

Claude (the coder)



Source link

Related Articles

Back to top button