MoCapAct open data library aims to help build “humanoid control models”

You’ve seen the faintly menacing humanoid robot Atlas doing backflips or parkour in viral videos. But organisations wanting to create bipedal robots face real training challenges. It’s an expensive business, creating physical robots and trying to learn from mistakes that tend to break them. There’s no shortage of companies trying to overcome these hurdles however; not least because wheeled robots continue to face challenges with stairs, fallen logs or debris and even kerbs. As a result bipedal robots remain an experimental fascination of many organisations – whether they’re thinking of creating super soldiers, care workers or automated factory labour – yet most training of such machines necessarily takes place via computational simulations.

As a recent paper from a robot learning research team at Microsoft notes, “training AI agents with humanoid morphology to match human performance across the entire diversity of human motion is one of the biggest challenges of artificial physical intelligence”, and even in the realm of AI training – in which blobby caricatures of humanity gyrate and stumble on on a vast mosaic pavement – these simulated efforts to replicate human motion can get expensive very swiftly: reinforcement learning (RL) underpinned by motion capture data involves a trial-and-error approach that generates computational bills out of the reach of most organisations.

See: McDonalds CEO: Robots won’t take over our kitchens

Now a team at Microsoft has open sourced (under the CDLA-Permissive-2.0 licence) what it describes as a “multi-task dataset for simulated humanoid control” as well as the code used to generate the policies to train humanoid robots in a move intended to “level the playing field and make this critical research area more inclusive”. That move, they hope, will benefit not just those hoping to take their simulations to the physical realm, but also benefit those motivated by the “challenging and labor-intensive process” of creating and automating realistic animations of human movement under different scenarios for video games and films.

The release of the library and tools – dubbed MoCapAct and available here – comes after the team ran some unique training experiments that saw them able to trigger a simulated human to perform natural motion completion given a simple motion prompt; a physical or motion version, as it were, of the increasingly sophisticated sentence completion capabilities of advanced natural language processing (NLP) models. The team took a unique approach to the project designed to avoid the huge computational bottlenecks typically associated with training AI agents on motion capture data. (Previous efforts have used about ten billion environment interactions collected by 4000 parallel actor processes running for multiple days.)

https://www.youtube.com/watch?v=0b9aLxnZvtk&t=16s

The ingredients for the work were the “CMU Humanoid” from the popular dm_control humanoid simulation environment, which contains 56 joints and is designed to be similar in movement to an average human body, and the extensive CMU Motion Capture Dataset. As the team (full citation at bottom) noted: “To our knowledge, there are no agents publicly available that can track all the MoCap data within dm_control.

In a research paper published this month they explain in compelling detail how with MoCapAct they created a dataset of “high-quality experts and episode rollouts for the humanoid in the dm_control package” that for each of over 2500 MoCap clip snippets from the CMU Motion Capture Dataset provides a reinforcement learning-trained “expert” control policy (represented as a PyTorch model). This enables dm_control’s simulated humanoid to recreate the skill depicted in that clip snippet. That training, they note, “has taken the equivalent of 50 years over many GPU-equipped Azure NC6v2 virtual machines (excluding hyperparameter tuning and other required experiments) – a testament to the computational hurdle MoCapAct removes…”

While the dataset and domain may raise concerns on automation, the team noted, "we believe the considered simulated domain is limited enough [i.e. no human or object interaction] to not be of ethical import.

"This work significantly lowers the barrier of entry for simulated humanoid control, which promises to be a rich field for studying multi-task learning and motor intelligence" they concluded.

For those interested in more detail on the training techniques used, the paper is here. Research credit: Nolan Wagener, Andrey Kolobov, Felipe Vieira Frujeri, Ricky Loynd, Ching-An Cheng, Matthew Hausknecht