Loading…

Synthesizing Obama: learning lip sync from audio

Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on graphics 2017-08, Vol.36 (4), p.1-13
Main Authors: Suwajanakorn, Supasorn, Seitz, Steven M., Kemelmacher-Shlizerman, Ira
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track. Our approach produces photorealistic results.
ISSN:0730-0301
1557-7368
DOI:10.1145/3072959.3073640