ANA Avatar XPRIZE Finals: Reflections and Musings from a Robotics Engineer

14 min readJan 24, 2023

The ANA Avatar XPRIZE Finals competition was held over two days in Long Beach, CA last November. In it, 20 teams from around the world brought their robot avatars to compete for $10 million in prize money. Each team brought something different to the table in terms of expertise, and it was quite interesting to see how well (or not well) the human operators were able to control their robots.

What is a Robot Avatar?

A robot avatar is a system where a human can control a robot remotely, ideally in an immersive one-to-one way (though this can be debated). For example, moving one’s arm should cause the robot to move and match the pose of the human arm.

Avatars as a concept are not novel, in fact the word comes from Sanskrit to describe a Deity inhabiting a human. Sci-Fi and video games have recently brought popularity to the idea of a human inhabiting a robot avatar, and the technology in the last few years has made such an idea possible (and affordable).

In my opinion, a true robotic avatar should behave indistinguishably from a human. We are still a far ways from getting there, but this competition is meant to stimulate innovation and development toward that ideal.

Types of Robots on Display

Driving Modalities. Most robots at the competition where wheeled, which is not representative of a true robotic avatar but helps a lot when it comes to winning this competition. Some robots had differential drives with tank controls, others had omni-wheels for instantaneous movement in any direction, and at least one was on two-wheels using segway-style inverted pendulum controls. Yet others had legs and could theoretically walk but were placed on a wheeled platform for the competition. For the brave few who attempted to have their robots walk, the robots failed spectacularly.

Grasping Modalities. It must be noted that, while part of the competition was to sense relative weights of objects as well as tell apart rough vs smooth objects, some teams did not have a full, 5-finger robot hand with which to do this. As well, not all teams integrated haptics into their system, due to either time or cost unavailability (these teams mostly did not place well). Many teams had an arm/hand system on the robot, generally custom built although there were quite a few Universal Robot arms, Shadow Robot Hands, and some new entries into the space which I didn’t get the name of. Other teams chose to use basic clamping grippers for one or both hands, likely in an effort to be economical (Shadow Robot hands go for $100k!).

The AlterEgo robot, which used segway-style controls for movement

Methods for Robotic Embodiment

In terms of embodiment, or how to best make the teleoperator feel as if they were inside the robot, there were many ways teams tried to achieve this. I wasn’t able to get close to each of the robots, but I was able to chat with engineers from certain teams while they were in the pits for preparation.

Actuation Methods. There were many different ways in which teams decided to actuate their robots. Some were obvious, where others less so — I’ll try my best here to recount what I could glean from simple observation. Locomotion. Locomotion control becomes more complex when you realize that the operator is seated and their arms and hands are usually attached to a haptic device. Some teams still used a basic video game controller, where up/down is forward/reverse, left/right is turn. A few used foot paddle boards, so that moving your feet while seated would move the robot forward. Other teams used different modes for locomotion and grasping, such that the operator’s hands were free to gesture the robot forward in the locomotion mode. I’m afraid to admit I can’t remember much of the other modalities as I assumed the entire idea of locomotion to be trivial … Grasping. Most if not all teams had methods of tracking the position (3D) or pose (6D) of the user’s hands and telling the arm/hand assembly to match that. This is done using inverse kinematics. For robots with fingered hands, most teams had a device with servos that could tell position using force-feedback. Some teams may have employed hand-tracking tech, which is commercially available and easy to integrate, but I don’t recall seeing any.

Visual Methods. Many teams that employed a VR headset with either an HD camera or wide-angle lens, some with stereo infrared or fisheye cameras. Their VR apps usually displayed a heads-up-display (HUD) with important information relevant to the robot or mission. Other teams took a different approach, with some using an array of standard computer monitors.

Most robots had some form of interface on the robot to display the face of the teleoperator, for use to display facial expressions when talking with other humans. There were many unique ways in which this was done as well. Many teams had a tablet attached to part of the robot that had a video stream of the operator’s face. For those that used VR headsets, however, this wasn’t feasible. Some were creative and had stock images of the operators face; they also had a camera attached to the bottom of the headset and pointed at the operator’s mouth and used a (n open source?) ML algorithm to match the lips and lower facial expressions onto the image. There is also eye tracking in the headset that would capture eye movement — this would also be transformed on the image. The result of this lies deep in the uncanny valley where parts of the image move and others are static. Some teams opted to use the above technology, but instead of showing the operator’s face, they just displayed the eyes and mouth (less creepy). One team in particular (Dragon Tree Labs) developed their own robot head which consisted of 3 phone displays in the shape of a Japanese gundam robot head which would display the teleoperator’s head captured by a wide-angle camera.

NimbRo uses a VR headset with eye-tracking and facial feature detection. Note in the VR app the operator is told both the time of day and how much weight each respective hand is holding.

Audio Methods. This was a bit harder to glean on a robot to robot basis. I am assuming that most robots had a mono microphone that captured audio and displayed it to the teleoperator. Perhaps some teams had multiple microphones and placed them as 3D sounds in their VR app (would be difficult and overkill to do it outside of a game engine). Some teams took it up a notch and integrated a 3D microphone that captures audio in 360 degrees (3Dio has a commercial product for this). With this, the user can hear and place sounds in the world, which is the ultimate level of immersion needed for the teleoperated robot scenario. On the robot side, I believe most robot had simple or basic speakers — nothing too fancy needed here.

Touch/Haptic/Feeling Methods. Wow there were quite a lot of different approaches on display to convey a sense of touch and feeling, from haptic devices to visual feedback to devices that simulate temperature/wind. Conveying a sense of feeling and touch to the user is perhaps the most important aspect of a true robotic avatar system, as it allows the operator to feel ‘as if they were really there’. It is also the most interesting space in terms of innovation, as the visual and audio methods have existed for a long time and are generally solved save for latency issues which may disrupt the system.

Haptic devices allow for a tighter integration of the human-robot interface, and for them to work, there needs to be touch or load sensors on the robot and feedback actuators on the human.

Touch Sensors + Feedback. One team, Tangible, worked with HaptX to put touch sensing pads on each digit of their robot’s dual Shadow Robot Hands. The pads sense dozens of points of pressure, the data of which is sent to HaptX’s pneumatically actuated gloves, which fill dozens of tiny air pockets on the tips of the user’s fingers. The effect is that you are able to have a fine sense of touch and can feel many types of texture. I’m not able to confirm, but I assume other teams had more basic types of touch sensors from bump sensors to resistive elements, both of which are able to transmit one-dimensional pressure information to a rumble motor near the fingers.
Force Feedback. First, a primer. When controlling a robot avatar, it is important to not break it. They are expensive! Take the case when you want the robot to pick up an object with its hand and place it somewhere else. For one, you don’t know how heavy the object is — if it is too heavy, it will break the robot’s arm/hand or damage the motors. For two, it is not always obvious to tell when your grasp of the object has succeeded; you could be clenching your fists entirely closed to grasp, say, a bottle, and your robot will be attempting to match the clenched fist grasp while holding the bottle — this may lead to material and motor damage. While one can include software to intervene and stop the robot before anything breaks, it is better to let the operator know the limits of the machine they are controlling. For estimating weight, some teams used motor/servo feedback from the arm, ran it through a weight-estimation algorithm or function, and sent that information to the user’s on-screen display. More advanced teams had the teleoperator sit with arms attached to another pair of robot arms. In this setup, the robot arms can apply a force in any direction that is acted on the robot’s robot arms; this means that the operator will feel the absolute weight of objects. For grasping, many teams employed gloves with integrated servos that attach to one’s fingers. Once actuated, these servos restrict your fingers moving forward, providing the illusion that your hand is actually grasping an object.

Parts of Tangible’s robot on display. I even had the opportunity of doing a demo with it! Note the green finger tips — these are the touch sensors! I like how they are nice and flush with the rest of the finger.

Miscellaneous. One team integrated a system that simulated air resistance / wind. They either had wind/pressure sensors (not sure?) on the robot, or just read the linear and angular velocities from the robot. With this data, they would blast air at the operator in the direction that they were moving. This is a pretty neat concept! There is also a functionality to blast A/C or heat in certain directions, but I’m not sure this was used during the competition. I’m not sure how well it worked inside of a contained environment like the Long Beach Convention Center (I don’t remember feeling a draft..) but outdoors it must feel surreal.

Smell Methods. None on display here. This is quite hard to do as smell is not easily digitize-able. There are some startups working on this tech but expect this to evolve in the next decade.

Taste Methods. None on display here. I don’t think anyone wants this.

The Competition

The competition itself consisted of ten stages, and I’ll detail the major ones here.

Intro Stage. The first stage was an introductory stage where the human teleoperator would drive up and hear instructions from a human on what to do for the next stages. At this stage, judges evaluate how well they believe the teleoperator is able to communicate with the human.

Electrical Panel Stage. From there, they would navigate to a mock electrical panel and flip a lever from ‘off’ to ‘on’. This tests the strength of the robot hand and its ability to follow a narrow trajectory.

Robot navigating to the second stage electrical panel

Obstacle Course Stage. Once the lever is flipped, the robot must navigate an ‘obstacle course’ where the teleoperator would have to move around 100 meters of floor and rocks. This tripped up a surprising amount of robots, in humorous fashion. One humanoid robot simply lost connection and fell on its face. Another operator didn’t have sight of the extremities on their robot’s large wheel base, got caught on a rock and fell over, smashing its top on the ground with expensive electronics flying everywhere. It reminded me of the early DARPA robotics challenges where the robots just collapsed for no reason (lol).

Weighted Bottle Stage. In this stage, the robot must choose which bottle in a row of bottles is heavier than the others. There were six bottles and two were heavier. Using brute force, the operator must pick up at least two bottles and determine their relative weights. This is a nice step up in difficulty that requires the robot to have a method of determining weight.

Unscrewing Bolt Stages. After that, the robot would have to pick up a power tool and unscrew a bolt holding a ‘restricted area’ panel in place. Most teams did not get past this stage, as it proved quite difficult. This was also the most boring stage to observe, as it consisted of many robots fumbling for minutes on end to pick up the power tool or get the alignment with the bolt just right. Hours on end where nothing remarkable happened! That being said, the task is not easy. The avatar system must be responsive to the operator’s hand and finger movements in order to grip the power tool and depress the start button with its index finger. If the motors aren’t strong enough to hold a grasp for long, they might release, and the tool will become dislodged, or fall. The robot also must be stable as it moves a few feet to the right for the next station; many of the robots were jittery when moving and this caused the tool to dislodge. On unscrewing the bolt, the operator must be able to see the bolt, which is 1–2 feet below most of the operators’ field of view. Most teams had static cameras mounted on the body where it is difficult to change the pose relative to the robot. NimbRo, the winner of the competition by a large margin, had its head and camera mounted on a third robotic arm; that way, the teleoperator could lean down to get a better view of the bolt. Actually unscrewing the bolt is a totally different challenge. Lining it up is difficult if the hand/arm assembly is not very responsive, and the assembly should be strong and responsive enough to apply a reverse torque for a few seconds. This stage provided some entertainment however, as it became a ‘can they do it??’ challenge.

At one very tense point, after the Team UNIST robot had successfully inserted the tool into the bolt and applied a reverse torque several times without it being removed, the operator put the tool into the slowest possible mode. The crowd watched with baited breath as, for the next few minutes, the bolt slowly became looser and looser. Finally, just before time ran out, the bolt become dislodged and the crowd went wild! The operator drop-mic’ed the tool and did a little dance after his success.

The extremely tense moment for Team UNIST

Retrieving Precious Minerals Stage. The final stage was a bit simpler than the previous one, but required the avatar to have touch sensors or a way to estimate texture. The robots would drive to a covered booth with five ‘moon rocks’, two of which had rough textures while the other three were smooth. The operator had to retrieve one of the rough-textured rocks using their hands only. If a team, got to this stage, chances were high that they would succeed but there were some who failed due to the robot hand getting stuck or broken inside of the entrance to the booth.

The Pits

All 20 teams who were competing had taken spaces in the pit area, where the public could peek in and see the teams; strolling by, you would see them repairs being made on robots, talking to visitors, and even giving demos of their work. It is very similar in spirit to FIRST robotics competitions where middle- and high-schoolers compete for glory.

There were also several local universities, government organizations, and startups staged there— some showing off research, some bringing their own robots, and some with demos. I wish I would’ve taken more pictures here! Courtesy of the German Aerospace Center (DLR) I was able to control a robot in Munich using a Haption arm. I put my arm in the Haption device and it was able to read my 6D pose and have the robot in Germany match it. The arm also allows for gripping objects; I was able to have the robot in Munich quickly grab rod A and insert it into hole B all from SoCal with acceptable latency. Awesome!

A Haption arm used to control a Kinova arm (in the background) to pick up kids’ blocks and put them into holes. Yes, it’s like we’re two years old again but more fun.

I was also able to have a demo of the Sensiks VR booth, which syncs directional wind and heat in a chamber with a VR experience. Imagine driving a motorcycle down the street and through a ring of fire. You would feel all of that in here. The Sensiks rep mentioned that all the features are currently being programmed manually, and that makes sense for a VR experience, but it will be neat to use this for video games or even the real world. They are part of one of the teams, and built the device that works in real time that I mentioned in the Miscellaneous Haptics section above.

The End

After two days, the competition drew to a close. Only four out of the 20 teams were able to complete the entire 10-task course over two runs, and NimbRo from the University of Bonn blew them all out of the water to get first place, finishing in 05:50. Pollen Robotics out of France also had an impressing showing, followed by Team Northeastern.

Peter Diamandis, the billionaire serial entrepreneur who founded the XPRIZE foundation came out and gave a speech about the competition and congratulated all of the teams for their hard work and determination. He then presented the top three teams with their prizes.

Peter Diamandis speaking about the Avatar XPRIZE

Ultimately, NimbRo won 1st place and received $5 million. Not a number to sneeze at for a University lab!.

Of course, this is not the end for robotic avatars in the world. They will become better and better and hopefully (or dreadfully) will become indistinguishable from humans.

Thanks for getting to the end of my article — I hope you enjoyed it. If you have any comments on the competition or robots in general, I’d love to chat. To learn more about the ANA Avatar XPRIZE, check out these links.

ANA Avatar XPRIZE Finals: Reflections and Musings from a Robotics Engineer

ANA AVatar XPRIZE | Finals Testing

Finals Testing Event ANA AVATAR XPRIZE FINALS TESTING EVENT: The $10M ANA Avatar XPRIZE is a four-year global…

Written by Felcjo Ringo