Huawei researchers say giving AI a ‘body’ is the next step toward human-level agents

Having a body, the researchers argue, will allow AI to understand action, memory, and experience better.

Huawei researchers say giving AI a ‘body’ is the next step toward human-level agents

A team of researchers from Huawei’s Noah’s Ark Lab in Paris recently published pre-print research outlining a potential framework for “embodied artificial intelligence” (E-AI), something they say will serve as the “next fundamental step in the pursuit of artificial general intelligence (AGI).”

AGI, sometimes called “human-level AI” or “strong AI,” typically refers to an artificial intelligence (AI) system capable of performing any task given the necessary resources. While there’s no clear scientific consensus as to what, exactly, would qualify a given AI system for consideration as a general intelligence, companies such as OpenAI have been founded solely for the purpose of pursuing this technology.

Large language models

Upon the advent of generative pre-trained transformer (GPT) technology in the late 2010s, many experts working on AGI adopted the mantra that “scale is all you need” — meaning they believed transformers, at scales beyond what was currently possible, will eventually lead to an AGI model.

But the Huawei team’s paper essentially argues that large language models, such as OpenAI’s ChatGPT and Google’s Gemini, can’t understand the real world because they don’t live in it.

Per the paper:

“It is a prevalent belief that simply scaling up such models, in terms of data volume and computational power, could lead to AGI. We contest this view. We propose that true understanding … is achievable only through E-AI agents that live in the world and learn of it by interacting with it.”

Embodied artificial intelligence

In order for AI agents to truly interact with the real world, claim the researchers, models will need to be housed in some form of embodiment capable of perception, action, memory, and learning.

Perception, in this context, means giving the AI system the ability to obtain raw data from the real world, in real-time, and the capability to process and encode that data into a latent learning space. Essentially, AI will need to be able to pay attention to what it wants to, with its own “eyes” and “ears,” in order to understand the real world well enough to act as a general intelligence.

Along with perception, agents must be able to take actions and observe their outcomes. Current class AI models are “pre-trained,” like a student who is given a test and its answers at the same time. By allowing AI to act on its own and perceive the results of its actions as new memories, the team believes agents could become capable of learning about the world the same way living creatures do, through trial and error.

Ultimately, the researchers demonstrate a theoretical framework by which an LLM or foundational AI model could be embodied to achieve these goals one day.

However, the researchers also point out that there are myriad challenges standing in the way. Not the least of these is that the most powerful LLMs currently “exist” on massive cloud networks, making embodiment a difficult proposition with today’s technology.

Related: Nuclear fusion breakthrough could revolutionize artificial intelligence

Related Articles

Responses