What if you could manipulate the facial features of a historical figure, a politician, or a CEO realistically and convincingly using nothing but a webcam and an illustrated or photographic still image? A tool called MarioNETte that was recently developed by researchers at Seoul-based Hyperconnect accomplishes this, thanks in part to cutting-edge machine learning techniques. The researchers claim it outperforms all baselines even where there’s “significant” mismatch between the face to be manipulated and the person doing the manipulating.
MarioNETte advances the state of the art by incorporating three novel components: an image attention block, a target feature alignment, and a landmark transformer. The attention block allows the HUC99 model to attend to relevant positions of mapped physical features, while the target feature alignment mitigates artifacts, warping, and distortion. As for the landmark transformer bit, it adapts the geometry of the driver’s poses to that of the target without the need for labeled data, in contrast to approaches that require human-annotated examples.
The researchers trained and tested MarioNETte using VoxCeleb1 and CelebV, two open source corpora of celebrity photos and videos. The models and baselines were trained using 1,251 different celebrities from VoxCeleb1 and tested on a set compiled by sampling 2,083 image sets from a randomly selected 100 videos of VoxCeleb1 (plus 2,000 sets from every celebrity in CelebV).