Humanoid Robots 22 min

How a Humanoid Robot Actually Works: A Visual Guide for Everyone Who Is Not an Engineer

By Robots In Life
explainer engineering beginners how-it-works hardware AI

TL;DR

You have seen the viral videos. A robot walks across a factory floor, picks up a box, and places it on a shelf. But what is actually happening inside that machine? This guide tears open the hood on five core systems that make a humanoid robot work, using real specs from the robots you can actually buy today.

You have seen the videos. A bipedal machine walks across a warehouse floor, bends down, picks up a plastic tote, and sets it on a shelf. Another one does a backflip on a factory demo stage. A third chats with a human visitor in natural language while handing over a cup of coffee.

From the outside, these machines look almost magical. From the inside, they are five engineering systems bolted together and fighting for battery power.

This guide will walk you through each of those five systems, explain what each one actually does, and use real specifications from robots you can track on this site. No equations. No jargon without explanation. Just the honest mechanics of how a humanoid robot goes from standing still to doing useful work.

The machines we will reference throughout this guide

$16K

Unitree G1

23-43 DoF, 35 kg

$250K

Agility Digit

~30 DoF, 65 kg

56 DoF

Boston Dynamics Atlas

90 kg, enterprise-only

42 DoF

Figure 03

61 kg, Helix AI

The five systems at a glance

Before diving into each one, here is the high-level architecture. Every humanoid robot, from a $16,000 Unitree G1 to a multi-million-dollar Boston Dynamics Atlas, runs on the same five core systems. They differ in sophistication, cost, and capability, but the basic structure is universal.

Core systems of a humanoid robot

1

Perception

Cameras, LiDAR, IMU, force sensors

2

AI / Planning

Foundation models, path planning, task reasoning

3

Locomotion

Legs, joints, actuators, balance control

4

Manipulation

Arms, hands, grippers, force control

5

Power

Battery, power distribution, thermal management

The perception system sees the world. The AI system decides what to do about it. Locomotion moves the body. Manipulation interacts with objects. And power keeps everything running, for as long as the battery allows.

That last part turns out to be the binding constraint on everything else. But we will get to that.

System 1: Locomotion - how it walks without falling over

Walking is something humans do without thinking. For a robot, it is the single hardest mechanical problem to solve.

A bipedal machine is inherently unstable. Unlike a car or a wheeled robot, which sits passively on a stable base, a two-legged robot is constantly falling and catching itself. Every single step is a controlled fall. The locomotion system must calculate hundreds of tiny adjustments per second to keep the center of mass over the feet, or, more precisely, over a constantly shifting “support polygon” defined by whichever foot is on the ground.

Degrees of freedom: why the number matters

The term “degrees of freedom” (DoF) describes how many independent joints and axes of movement a robot has. Think of it this way: your elbow has one degree of freedom (it bends in one plane). Your shoulder has three (it rotates in three planes). Your entire body has roughly 244 degrees of freedom if you count every joint, from your spine to your toes.

244 approximate degrees of freedom in a human body

Humanoid robots do not match this number. They prioritize the joints that matter most for their intended tasks and skip the rest.

Degrees of freedom across current humanoid robots

Unitree G1 (base)

Consumer / Research 23 DoF
Industrial / Enterprise

The base consumer model. Enough for walking and basic grasping.

Unitree G1 (EDU)

Consumer / Research 43 DoF
Industrial / Enterprise

The research variant adds finger articulation and extra torso joints.

Xiaomi CyberOne

Consumer / Research 21 DoF
Industrial / Enterprise

Demonstration prototype from 2022. Limited practical dexterity.

Tesla Optimus Gen 2/3

Consumer / Research
Industrial / Enterprise 28 + 22 hands

28 DoF body plus 11 DoF per hand in Gen 3. Total around 50.

Agility Digit

Consumer / Research
Industrial / Enterprise ~30 DoF

Optimized for warehouse tote handling rather than general dexterity.

Figure 03

Consumer / Research
Industrial / Enterprise 42 DoF

16 DoF per hand. Designed for complex assembly line manipulation.

Apptronik Apollo

Consumer / Research
Industrial / Enterprise 44+ DoF

Modular design with swappable end-effectors.

Boston Dynamics Atlas

Consumer / Research
Industrial / Enterprise 56 DoF

The most articulated humanoid in production. Built for maximum versatility.

Fourier GR-2

Consumer / Research
Industrial / Enterprise 53 DoF

Originally from rehabilitation research. Extremely dexterous.

The difference between 23 DoF and 56 DoF is not just a number on a spec sheet. It determines what the robot can physically do. A 23-DoF robot can walk, turn, and grab large objects with a simple gripper. A 56-DoF robot can reach around obstacles, rotate its wrists to unscrew a bolt, and adjust its posture to squeeze through a narrow gap.

Actuators: the muscles

Every degree of freedom needs something to move it. In a humanoid robot, that something is an actuator, typically an electric motor paired with a gearbox. The actuator converts electrical energy into rotational torque, which moves a joint.

The quality of actuators is one of the biggest differentiators between a $16,000 robot and a $250,000 one. Cheap actuators are less precise, generate more heat, and wear out faster under load. Premium actuators (like the ones in Boston Dynamics Atlas or Figure 03) offer higher torque-to-weight ratios, better backdrivability (meaning a human can push the joint and it will give way safely), and tighter position control.

Unitree keeps its G1 affordable partly by using actuators from its existing quadruped robot supply chain. The same motor that drives a Unitree Go2 robotic dog’s leg also drives the G1’s knee joint. This is smart manufacturing, but it means the G1’s actuators are optimized for a 15 kg quadruped, not a 35 kg biped carrying a payload.

At the other end, Boston Dynamics designs custom actuators for Atlas with up to 450 Nm of peak torque, allowing the 90 kg robot to lift 50 kg and perform dynamic movements like running and jumping. Fourier’s GR-2 uses its proprietary FSA 2.0 actuators rated at 380 Nm, which descend from years of rehabilitation robotics research.

Balance control: the hidden software

Hardware alone does not make a robot walk. The balance control loop, a real-time software system running at 500-1000 Hz (500 to 1000 cycles per second), constantly reads data from the robot’s inertial measurement unit (IMU) and joint encoders, then adjusts motor commands to keep the robot upright.

Modern humanoid robots use a combination of two approaches:

Model-based control uses a physics model of the robot’s body. The software knows the exact mass, length, and joint limits of every limb, and it calculates the forces needed to maintain balance using physics equations. This is reliable and predictable, but it struggles with unexpected situations like stepping on a loose rock.

Learned control uses neural networks trained through millions of simulated walking attempts. The AI does not have an explicit physics model. Instead, it has learned patterns: “when the IMU reads this tilt and the left foot senses this force, apply this motor command.” This approach handles surprises better but can behave unpredictably in edge cases.

Most production robots blend both approaches. The Unitree G1 uses reinforcement learning trained in NVIDIA Isaac Sim for locomotion, running on an NVIDIA Jetson Orin processor. Boston Dynamics Atlas uses what the company calls “Large Behavior Models,” combining learned policies with model-based safeguards.

How the balance control loop works (simplified)

1

IMU + joint sensors read current body state

Tilt angle, angular velocity, foot contact force

2

Balance controller computes correction

500-1000 Hz update rate, physics model + neural network

3

Motor commands sent to leg actuators

Torque targets for hip, knee, and ankle joints

4

Robot adjusts posture in milliseconds

Loop repeats every 1-2 ms

System 2: Manipulation - why hands are harder than legs

If locomotion is the hardest mechanical problem, manipulation is the hardest combined mechanical-and-AI problem. Walking is repetitive. The robot does basically the same motion pattern over and over. But picking things up is different every time. A coffee mug, a cardboard box, a screwdriver, and a raw egg all require completely different grip strategies, force levels, and approach angles.

The spectrum of robot hands

Robot hands range from simple parallel grippers (two flat surfaces that squeeze together) to fully articulated five-finger hands with tactile sensors on every fingertip. Where a robot falls on this spectrum tells you almost everything about what tasks it can perform.

Hand dexterity across the market

Basic

Unitree G1 (base)

Simple gripper, limited grasping

16 DoF

Figure 03 per hand

Force sensing, fine manipulation

11 DoF

Tesla Optimus per hand

Tactile sensing, Gen 3 design

The Unitree G1 base model ships with a basic gripper. It can pick up a water bottle or a small box. It cannot tie a knot, turn a screwdriver, or handle a thin piece of paper. The EDU variant offers an optional five-finger hand, but its dexterity still falls short of purpose-built industrial hands.

Figure 03’s hands have 16 degrees of freedom each and force sensors that can detect how hard the fingers are squeezing. This allows the robot to handle fragile items and perform assembly tasks that require precise force control, like inserting a connector into a socket or threading a wire through a hole.

Tesla’s Optimus Gen 3 design puts 11 DoF in each hand with tactile sensing across the fingertips. This is fewer joints than Figure 03, but Tesla’s approach uses end-to-end neural networks trained on thousands of hours of manipulation data from its Gigafactories, compensating for fewer mechanical degrees of freedom with more sophisticated AI control.

Payload: the practical bottleneck

Payload capacity, how much weight the robot can carry, is determined by the combined strength of the arm actuators, the structural rigidity of the arm and torso, and the robot’s ability to maintain balance while holding something heavy.

Payload capacity comparison

Unitree G1

Lighter robots 3 kg
Heavy-duty robots

Fine for a water bottle. Cannot move warehouse totes.

Xiaomi CyberOne

Lighter robots 1.5 kg per hand
Heavy-duty robots

Demonstration prototype. Very limited practical payload.

Agility Digit

Lighter robots
Heavy-duty robots 16 kg

Built for Amazon warehouse totes (typically 10-15 kg).

Tesla Optimus

Lighter robots
Heavy-duty robots 20 kg

Handles automotive parts on the Gigafactory line.

Figure 03

Lighter robots
Heavy-duty robots 20 kg

Same 20 kg class as Tesla, different manipulation approach.

Apptronik Apollo

Lighter robots
Heavy-duty robots 25 kg

Highest bipedal payload. Hot-swap battery design.

1X NEO

Lighter robots
Heavy-duty robots 25 kg carry, 70 kg lift

Musculoskeletal design enables high strength at 30 kg body weight.

Boston Dynamics Atlas

Lighter robots
Heavy-duty robots 50 kg lift

The strongest humanoid. Uses its 90 kg mass for leverage.

The Unitree G1’s 3 kg payload is the direct consequence of its 35 kg body weight and consumer-grade actuators. Physics is unforgiving here: a light robot with weak motors simply cannot lift heavy objects without tipping over. The G1 trades payload for portability and affordability.

At the other extreme, Boston Dynamics Atlas can lift 50 kg because it weighs 90 kg itself (providing counterbalance), uses custom high-torque actuators, and has a structural frame designed for heavy loads. But that 90 kg body weight also means Atlas consumes far more energy to walk, which circles back to the battery problem.

The 1X NEO is an interesting outlier. At just 30 kg body weight, it can carry 25 kg and lift 70 kg. The secret is its musculoskeletal design: instead of rigid gearbox actuators, NEO uses a soft-bodied system with cable-driven artificial muscles that mimic how human tendons work. This is lighter per unit of force, but the technology is newer and less proven at scale.

System 3: Perception - how the robot sees

A humanoid robot’s perception system is its window to the world. Without it, the AI has nothing to reason about and the locomotion system has no idea where to step.

The sensor stack

Every humanoid robot uses a layered sensor approach. No single sensor type can provide all the information the robot needs.

Typical perception sensor stack

1

RGB cameras

Color video for object recognition, face detection, reading labels

2

Depth cameras / stereo vision

3D distance measurement, obstacle detection, spatial mapping

3

LiDAR (on some models)

Precise laser-based distance mapping, works in low light

4

IMU (inertial measurement unit)

Tilt, rotation, acceleration - essential for balance

5

Force/torque sensors

In joints and fingers, measures contact forces with objects

6

Joint encoders

Precise position of every joint, reports to the balance loop

The simplest setup, used by the Unitree G1, includes a depth camera, an IMU, and joint encoders. This is enough for basic navigation and object interaction in controlled environments.

The most complex setup, used by Boston Dynamics Atlas, adds stereo cameras, LiDAR, force/torque sensors in every joint, and multiple redundant IMUs. Atlas can map a cluttered factory floor, identify specific parts on a shelf, and feel exactly how much force its fingers are applying to a fragile component.

Tesla takes a camera-only approach for Optimus, mirroring the “Tesla Vision” philosophy from its self-driving cars. No LiDAR. Instead, multiple cameras feed into an end-to-end neural network that extracts depth, object identity, and spatial relationships purely from visual data. This is cheaper per unit but requires massive training data.

Figure 03 uses eight cameras (RGB plus depth) arranged for 360-degree coverage. Combined with the Helix foundation model, these cameras give the robot a continuous understanding of its entire surroundings without needing to turn its head.

Sensor fusion: combining everything

No single sensor provides a complete picture. RGB cameras cannot measure distance accurately. Depth cameras struggle in bright sunlight. LiDAR cannot read text on a label. Force sensors tell you about contact but nothing about what is 10 meters away.

Sensor fusion is the process of combining data from all sensors into a unified model of the world. The perception system creates and continuously updates a 3D map of the robot’s surroundings, tracks moving objects, identifies surfaces the robot can walk on, and labels objects the robot might need to interact with.

This fusion process runs in real time, typically at 30-60 Hz, on the robot’s onboard computer. The Unitree G1 handles this on an NVIDIA Jetson Orin (275 TOPS of AI compute). Boston Dynamics Atlas uses a custom compute platform with GPU acceleration. Apptronik Apollo runs dual NVIDIA Jetson modules (AGX Orin plus Orin NX) to split the workload between perception and planning.

System 4: AI and planning - the brain

This is where the greatest revolution in humanoid robotics is happening right now. Five years ago, most robots relied on carefully hand-coded instructions: “move arm to position X, close gripper, lift to position Y.” Today, the leading robots use AI systems that can learn new tasks from a handful of demonstrations and reason about novel situations they have never encountered before.

Traditional programming vs. foundation models

The distinction matters because it determines how quickly a robot can learn new tasks and how well it handles the unexpected.

Traditional (programmed) approach: A human engineer writes code specifying exactly what the robot should do in every situation. If the engineer did not anticipate a specific scenario, the robot either does nothing or does the wrong thing. Adding a new task requires more engineering time. This is how most industrial robots (arms in car factories, for example) have worked for decades.

Foundation model approach: A large neural network is trained on massive datasets of robot demonstrations, human videos, and language descriptions of tasks. Instead of hard-coding specific behaviors, the model learns general principles: “this is what picking something up looks like,” “this is how you navigate around an obstacle,” “this is what a human means when they say put that over there.” When the robot encounters a new situation, it can generalize from its training data rather than needing a new program.

AI systems across the market

Helix

Figure 03

Vision-language-action model

FSD Chip

Tesla Optimus

End-to-end neural network

GR00T

Apptronik Apollo

NVIDIA foundation model

What a foundation model actually does

Let us take Figure AI’s Helix model as a concrete example, since it is one of the most publicly documented systems.

Helix is a “vision-language-action” (VLA) model. That name describes its three input/output channels:

Vision: Helix processes raw camera feeds from Figure 03’s eight cameras. It does not just recognize objects (“that is a cup”). It understands spatial relationships (“the cup is on the edge of the table, upright, half-full”), physical properties (“the cup is ceramic, approximately 300 grams”), and affordances (“the cup has a handle that can be grasped from the left side”).

Language: Helix understands natural language instructions. A human supervisor can say “move the blue bin to the second shelf” and the model translates that into a sequence of robotic actions. It also reasons about ambiguity: if there are two blue bins, it can ask for clarification or use context to infer which one.

Action: Helix outputs low-level motor commands, specifying the exact torque, position, and velocity for every joint at every moment. The model does not hand off to a separate motion planning system. It goes directly from understanding (“I need to pick up the blue bin on the left”) to execution (“move shoulder joint to 45 degrees at 30 degrees per second while closing finger joints with 5 N of force”).

How Helix processes a task (simplified)

1

Camera input

8 cameras, RGB + depth

2

Language command

Natural language or fleet instruction

3

Helix VLA model

Unified reasoning across all inputs

4

Motor commands

Torque/position for all 42 joints

This is fundamentally different from the Unitree G1’s approach. The G1 runs learned locomotion policies (trained in simulation) for walking and basic movement, but relies on third-party software for complex task execution. A research lab using a G1 might install a ROS2-based manipulation pipeline that uses separate modules for object detection, grasp planning, and arm control. Each module is distinct, communicates through defined interfaces, and was likely developed by a different team. It works, but it is slower to adapt and more brittle when things go wrong.

The NVIDIA GR00T ecosystem

A middle path is emerging through NVIDIA’s GR00T (Generalist Robot 00 Technology) foundation model, which several robot manufacturers are integrating. Apptronik Apollo uses NVIDIA’s Jetson AGX Orin combined with the GR00T model for “learning from demonstration,” meaning a human teleoperates the robot through a task a few times, and the AI generalizes from those demonstrations to perform the task autonomously.

Boston Dynamics is also integrating NVIDIA Isaac GR00T with Atlas, alongside Google DeepMind’s Gemini Robotics. This hybrid approach combines different AI strengths: GR00T for general robotic reasoning, Gemini for language understanding and task decomposition, and Boston Dynamics’ own “Large Behavior Models” for athletic locomotion.

Edge compute vs. cloud

Where the AI runs matters for latency, privacy, and reliability.

All production humanoid robots run their real-time control loops (balance, locomotion, collision avoidance) on local hardware. You cannot afford network latency when you are catching yourself from a fall every 2 milliseconds. But the higher-level AI, the foundation model reasoning about what task to do next, can run either locally or in the cloud.

The Unitree G1 runs everything on its NVIDIA Jetson Orin locally. Tesla Optimus uses its custom FSD chip for on-device inference. Figure 03 has a custom AI accelerator on board but also offloads data wirelessly during dock charging. Agility Digit connects to the Arc cloud platform for fleet management and task assignment, with real-time navigation running locally.

The tradeoff is straightforward: local compute means lower latency and no dependency on internet connectivity, but it limits the model size you can run. Cloud compute lets you run larger, more capable models, but introduces latency and requires reliable connectivity.

System 5: Power - the binding constraint

Every engineering decision in a humanoid robot ultimately comes back to one question: how much battery can we fit, and how long will it last?

2-5 hrs typical battery life range for production humanoid robots

This is the single most important number in the entire specification sheet, and it is the one that gets the least attention in marketing materials. Battery life determines how long the robot can work, which determines whether it can complete a useful shift, which determines whether a business can justify buying one.

Why battery life is so short

A humanoid robot is doing something that batteries were never designed for: powering dozens of high-torque motors continuously while simultaneously running high-performance AI processors.

Consider the energy budget for a single step. The robot must:

  1. Compute the next foot placement (CPU/GPU power draw)
  2. Lift one leg against gravity (hip and knee actuators consuming power)
  3. Swing the leg forward (more actuator power)
  4. Absorb the landing impact (ankle actuator absorbing energy)
  5. Shift body weight (core and opposite leg actuators adjusting)
  6. Maintain upper body stability (arm and torso actuators compensating)

Multiply this by roughly 100 steps per minute of walking, add the constant power draw of cameras, LiDAR, processors, and communication systems, and you get a machine that consumes energy at an enormous rate relative to its battery capacity.

Battery life and weight across the market

Unitree G1

Battery life ~2 hours
Robot weight 35 kg

Shortest battery life, but also lightest. Smaller battery keeps cost down.

Xiaomi CyberOne

Battery life 2-3 hours
Robot weight 52 kg

Similar battery performance despite being heavier.

Fourier GR-2

Battery life ~2 hours (swappable)
Robot weight 63 kg

Swappable battery is a practical workaround for short runtime.

Tesla Optimus

Battery life 3-5 hours
Robot weight 57 kg

Tesla battery expertise shows. Best energy density in class.

Agility Digit

Battery life 4 hours
Robot weight 65 kg

Designed around warehouse shift schedules.

Apptronik Apollo

Battery life 4 hours (hot-swap)
Robot weight 73 kg

Hot-swap battery means zero downtime between packs.

Figure 03

Battery life 5 hours
Robot weight 61 kg

Wireless inductive charging. Best battery life in class.

1X NEO

Battery life 4 hours (842 Wh)
Robot weight 30 kg

Best battery-to-weight ratio. Musculoskeletal design is energy efficient.

Boston Dynamics Atlas

Battery life Hot-swap packs
Robot weight 90 kg

No fixed runtime. Continuous operation via battery swaps.

The engineering tradeoffs

Battery life is not just about stuffing a bigger battery into the torso. Bigger batteries are heavier, and heavier robots consume more energy to move, partially canceling the benefit. This is the fundamental weight-energy paradox of bipedal robotics.

There are only four ways to extend battery life:

1. Better battery chemistry. Tesla has an advantage here. The same lithium-ion cell research that powers Tesla’s cars feeds directly into Optimus battery design. Tesla’s 3-5 hour battery life in a 57 kg robot is the best energy density of any humanoid robot with a fixed battery pack.

2. More efficient actuators. The less energy each joint consumes per movement, the longer the battery lasts. This is why actuator quality correlates so strongly with price. Premium actuators (like those in Atlas and Figure 03) convert a higher percentage of electrical energy into useful mechanical work, with less lost to heat.

3. Lighter structural design. 1X NEO’s 30 kg body weight with 4 hours of battery life demonstrates this approach. By using a soft-bodied musculoskeletal design instead of heavy metal gearboxes, NEO reduces the energy needed for every movement. Less mass to accelerate and decelerate means less energy consumed per step.

4. Hot-swap or continuous charging. Boston Dynamics Atlas and Apptronik Apollo sidestep the battery life problem entirely by using hot-swappable battery packs. An operator (or automated system) can swap a depleted pack for a charged one in seconds, giving effectively unlimited runtime. Figure 03 uses wireless inductive charging at its dock, allowing it to top up during breaks.

Why the gap between $16,000 and $250,000 exists

Now that you understand all five systems, we can answer the question that draws many people to this topic: why does the Agility Digit cost over 15 times more than the Unitree G1?

The price difference maps directly to engineering choices across every system.

Advantages

Unitree G1 uses off-the-shelf actuators shared with its quadruped product line, Agility Digit uses custom actuators optimized for bipedal warehouse work
G1 has a basic gripper (or optional 5-finger hand) with 3 kg payload. Digit has purpose-built manipulation arms with 16 kg payload
G1 uses a single depth camera and IMU. Digit uses LiDAR, stereo cameras, IMU, and joint encoders
G1 relies on third-party open-source AI via ROS2. Digit runs Agility's proprietary Arc cloud platform with fleet management and over-the-air skill updates
G1 gets 2 hours of battery life. Digit gets 4 hours, enough for a practical warehouse shift
G1 has IP54 water resistance. Digit is built for the temperature, dust, and vibration conditions of an industrial warehouse

Limitations

Digit's $250,000 price makes it accessible only to large enterprises like Amazon
G1 at $16,000 is within reach of researchers, universities, and well-funded hobbyists
Digit's closed ecosystem means you cannot modify or extend its software
G1's open ROS2 SDK means a global community contributes improvements
Digit requires Agility's Arc cloud platform for most advanced features
G1 can operate fully air-gapped on the EDU variant with direct Ethernet

The G1 is not a bad robot. For its price, it is remarkable. But it is built to a $16,000 budget, and every system reflects that constraint. The actuators are lighter-duty. The sensors are fewer. The hands are simpler. The battery is smaller. The AI relies on whatever the user installs.

Digit is built to a “what does Amazon need to move totes reliably for 4 hours?” specification. Every system is engineered to that requirement, and the price reflects it.

Between these two extremes sits a growing middle tier. Figure 03 at $20,000 (announced target price for future volume production) and 1X NEO at $20,000 represent attempts to deliver industrial-class capabilities at a consumer price point. Whether that is achievable at scale remains to be seen. No one has done it yet.

The path forward: what changes next

Understanding these five systems also helps you understand where the industry is heading.

Locomotion is largely a solved problem for flat indoor environments. The remaining challenges are outdoor terrain, stairs with irregular dimensions, and operation in rain, snow, and ice. Boston Dynamics Atlas handles outdoor conditions down to -20 degrees Celsius. Most other humanoid robots are limited to 0-40 degree Celsius indoor environments.

Manipulation is the most active area of improvement. The gap between what robot hands can do and what human hands can do is still enormous. Expect rapid progress in tactile sensing, force control, and finger dexterity over the next 2-3 years as foundation models trained on manipulation data become more capable.

Perception will continue its shift toward camera-only systems. LiDAR adds cost and weight that manufacturers want to eliminate. Tesla’s camera-only approach for Optimus, if successful, will pressure other manufacturers to follow.

AI is where the biggest gains will come. Foundation models are doubling in capability roughly annually. The transition from “program every task” to “demonstrate a task a few times” to “describe a task in words” is happening now. Figure’s Helix and Boston Dynamics’ Large Behavior Models represent the current frontier. Within 2-3 years, expect robots that can learn most manipulation tasks from natural language instructions alone.

Power remains the hardest constraint to crack. Battery chemistry improves at roughly 5-8% per year in energy density. There is no Moore’s Law for batteries. The practical solutions will be better energy efficiency (lighter robots, better actuators), hot-swap designs for continuous operation, and wireless charging infrastructure built into workplaces.

Where each system stands today

85%

Locomotion

Largely solved indoors, challenges outdoors

40%

Manipulation

Biggest capability gap vs. humans

70%

Perception

Good indoors, struggles in outdoor/varied lighting

30%

AI / Planning

Foundation models improving fast

20%

Power

The binding constraint, slowest to improve

A practical checklist for evaluating any humanoid robot

The next time you see a humanoid robot announcement, here are the questions that actually matter. Each one maps to one of the five systems.

Locomotion: How many degrees of freedom? What is the walking speed? Can it handle stairs and uneven ground, or only flat floors?

Manipulation: What are the hands? Simple grippers or articulated fingers? What is the payload capacity? Does it have force or tactile sensing?

Perception: What sensors does it use? Camera-only or camera-plus-LiDAR? How many cameras, and what coverage (forward-facing only or 360 degrees)?

AI: What AI system runs it? Is it a foundation model with few-shot learning, or does every task need to be programmed? Can it understand natural language instructions? How many demonstrations does it need to learn a new task?

Power: What is the battery life under realistic work conditions (not “ideal” conditions)? Is the battery hot-swappable? What is the charging time? What is the battery replacement cost and cycle life?

The humanoid robot industry is growing fast. Goldman Sachs projects a $38 billion market by 2035. But behind the headlines and viral videos, these machines are engineering systems built from real components with real limitations. Understanding those five systems, what they do, how they interact, and where the current limits are, turns you from a spectator into someone who can actually evaluate what is real, what is hype, and what is coming next.

Sources

  1. IEEE Spectrum - Guide to Humanoid Robots - accessed 2026-03-28
  2. Boston Dynamics Atlas Technical Overview - accessed 2026-03-28
  3. Figure AI Helix Foundation Model - accessed 2026-03-28
  4. Unitree G1 Product Page and Specifications - accessed 2026-03-28
  5. Agility Robotics Digit Product Page - accessed 2026-03-28
  6. Goldman Sachs - Humanoid Robot Market Forecast - accessed 2026-03-28
  7. NVIDIA Isaac GR00T Foundation Model for Humanoid Robots - accessed 2026-03-28
  8. Tesla Optimus AI and Robotics Overview - accessed 2026-03-28
  9. Apptronik Apollo and NVIDIA Collaboration - accessed 2026-03-28
  10. 1X Technologies NEO Product Page - accessed 2026-03-28
  11. Fourier Intelligence GR-2 Humanoid Platform - accessed 2026-03-28
  12. MIT Technology Review - The Hard Problem of Robot Hands - accessed 2026-03-28
  13. Nature - Advances in Legged Locomotion - accessed 2026-03-28
  14. Science Robotics - Foundation Models for Robotic Manipulation - accessed 2026-03-28
  15. Boston Dynamics Blog - Large Behavior Models for Atlas - accessed 2026-03-28

Related Posts

Humanoid Robots 18 min

From Roomba to Atlas: The Smart Level Scale Explained, and Where Every Robot Falls

Every robot on this site gets a Smart Level rating from 1 to 10. But what do those numbers actually mean? We walk through the entire scale, level by level, using real machines you can buy, watch, or worry about.

smart-level scale explainer
Humanoid Robots 18 min

The $39 Billion Company That Has Shipped 200 Robots: Figure AI and the Valuation-to-Deployment Gap

Figure AI is valued at $195 million per robot shipped. Unitree sells its humanoid for $16,000 and has moved 5,500 units. The valuation-to-deployment gap across the humanoid industry tells you everything about what investors are actually buying.

Figure AI valuation investment
Humanoid Robots 16 min

The First Robot That Quit: What Happens When a Humanoid Breaks Down on Shift

The humanoid robot industry has shipped over 15,000 units. Nobody is talking about how often they break. Motor burnout, sensor drift, software crashes, and battery degradation are generating the first real reliability dataset in history. The companies that solve maintenance will win the market. The ones that ignore it will ship expensive paperweights.

reliability maintenance downtime
The Future 15 min

The $25,000 Robot Arm vs the $16,000 Humanoid: Why Full Bodies Win in the End

FANUC arms cost $25,000 and run 100,000 hours without failure. A Unitree G1 costs $16,000 and falls over. So why are billions flowing into humanoid form factors instead of cheaper, proven arms? Because the real cost of a robot is not the robot. It is the $500,000 factory retooling, the building designed for human bodies, and the $45,000 per year worker the robot is meant to replace.

industrial-arms form-factor economics