by Alex Kantrowitz at Big Technology: In Deepak Pathak’s telling, nobody is building human-level artificial intelligence with language alone. You could train a large language model on billions of descriptions of gravity, but it would never conceptualize it, since it has never experienced the real world. Train it with every physics textbook on earth, and it still can’t visualize what happens when you drop a ball from your hand. With this natural limitation, it hallucinates.
Pathak, a Carnegie Mellon professor and former Meta AI researcher, suggests that today’s leading AI research labs are perhaps too focused on building artificial general intelligence — AI with human-level thought and dexterity — via raw data and compute. To get to ‘AGI,’ he says, the technology has to go pre-verbal, and real world. AI going straight to language is like giving answers to a test without teaching the course. The solutions might tell you something, but your learning is limited.
“The recipe of LLMs is data. Nothing else but data,” Pathak tells me in a video call this week. “Physicality, action, is the base framework for building intelligence.”
So Pathak is trying to help artificial intelligence take its next leap forward by teaching it how to understand the physical world. He’s working to build ‘sensory motor common sense,’ as he calls it, into AI models. The idea is for AI to go out in a natural environment, learn about it on its own, find its way around, and adapt to its surroundings. Think of it as the journey animals first took to language: sensing an environment, then finding ways to move within it. Only with that foundation mastered, does it make sense to verbalize.
To build this common sense into AI, robotics is the natural path forward, but not via the predetermined movements that are most common today (the term ‘acting robotic’ is what Pathak wants to get away from). Pathak is instead dropping robots into totally new environments and giving them nearly limitless opportunities to figure out how to move around and adapt to their surroundings.
Using a form of AI called adaptive reinforcement learning, Pathak initially trains the robots in simulation. He gives them a goal to work toward, and allows them to learn from each failure until they get there on their own. After the robots learn how to move in simulation, their ‘brain’ is transposed into a physical machine and they engage the physical world, building a deeper understanding of how it works.
On his screen, Pathak pulls up a video of a dog-like robot he’s developed, walking up a flight of stairs outside, each matching its height. The robot deftly moves up stairs, adapting in real time to different angles, obstacles, and surface consistencies. “These robots are not just running in the real world, they are continuously adapting,” says Pathak.
Then things get a little crazy. Pathak shows a video of a human opening a drawer, followed by a robot opening the same drawer. By watching how humans use their hands, Pathak’s robots are now able to learn how to do what we do, and do it themselves.
The training method is what Pathak and his colleagues call ‘WHIRL,’ or In-the-Wild Human Imitating Robot Learning. On screen, other robots open refrigerator doors, turn on faucets, close toasters, pick up trash, and even clean a whiteboard. Importantly, this is intelligence is ‘general,’ meaning there’s no need to train new models from the ground up for each task. This is similar to humans, who can open an door and play chess. No AI system is close to being able to operate at that level of generality yet, but perhaps this is a way forward.
Pathak and a Carnegie Mellon colleague Abhinav Gupta last year founded a company called Skild AI to build and commercialize a ‘general purpose brain’ for robots. And last month, the company announced a $300 million funding round from Softbank, Jeff Bezos, and others. Pathak told me he wants to build a robot ‘foundational model’ where you could give a command and it would execute it, no matter the type of robot. It would be “a shared brain that can operate across all kinds of scenarios,” he said.
Imagine that type of ‘common sense,’ intelligence combining with today’s large language models, and you could see AI finally being able to reason about the world we inhabit, predict more effectively, and hallucinate less. “The amount of reasoning you are doing physically, in your body, when you do tasks, is way bigger than what you’re saying,” Pathak says. “That’s where the core of intelligence is.”
More here.